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ACYL COENZYME A THIOESTERASES 

This application claims priority from provisional application U. S. Serial No. 
60/220,028, filed on July 21, 2000. 



FIELD OF THE INVENTION 

The present invention provides compositions and methods related to acyl-coenzyme 
A thioesterases. In particular, the present invention is related to plant acyl coenzyme A 
10 -thioesterases. 

BACKGROUND OF THE INVENTION 

Acyl coenzyme A (acyl-CoA) thioesterases (ACHs) are enzymes that cleave thioester 
bonds between CoA and a fatty acyl group. These enzymes, also referred to as acyl-CoA 

1 5 hydrolases, have been identified in all anunal and bacterial organisms studied to date, 

including E. coli and humaas. Eukaryotic cells contain various isoforras of these enzymes, 
which are located in various organelles, most notably mitochondria, peroxisomes, and the 
endoplasmic reticulum, as well as the cytosol. Although their true physiological role is not 
currently understood, studies conducted in yeast have indicated that deletion of the 

20 peroxisomal form leads to decreased growth when oleic acid is provided as the carbon 

source. Other studies in rats indicate that other isoforms exhibit increased expression under 
conditions where fatty acids are being broken down (for example, during fastiag periods). 
Overall, the experimental evidence gathered to date indicates that these enzymes have a role 
in Upid oxidation and may be involved in regulation of the CoA pool within cells. Thus, 

25 there remains a need for p\irified forms of these enz5mies, as well as nucleic arid polypeptide 
sequences, such that the function of these enzymes can be further elucidated and methods 
developed for regulation of fatty acid metabolism. 



30 SUMMARY OF THE INVENTION 

The present invention provides compounds, compositions and methods related to 
acyl-coenzyme A thioesterases. In particular, the present invention is related to plant acyl 
coenzyme A thioesterases. 

- 1 " 
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The present invention provides an isolated nucleic acid sequence comprising SEQ ID 
NO:l or SEQ ID NO:2 or SEQ ID NO:3 or SEQ ID NO:4 or a nucleic acid sequence that 
hybridizes to at least one of the foregoing sequences under conditions of medium to high 
stringency. The present invention also provides an isolated nucleic acid sequence encoding a 
5 protein comprising an amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6 or SEQ ID 
NO:7orSEQIDNO:8. 

In some embodiments of the present invention, the nucleic acid sequences 
described above are operably linked to a heterologous promoter. In further embodiments, 
the sequences described above are contained within a vector. In some embodiments, the 
10 nucleic acid sequence in a vector is in a sense orientation; in other embodiments, the nucleic 
acid sequence in a vector is in an antisense orientation. The present invention also provides 
compositions comprising any of the nucleic acid sequences and vectors as described above. 

In still further embodiments, the present invention provides a host cell transfected 
with any of the nucleic acid sequences or vectors or compositions as described above. The 
15 present invention is not limited to any particular host cell. Indeed, a variety of host cells are 
contemplated, including, but not limited to, prokaxyotic cells, eukaryotic cells, plant tissue 
cells, and cells in planta. The present invention also provides a plant transfected with any of 
the nucleic acid sequences or vectors or compositions as described above. In some 
embodiments, the present invention provides a seed from the transfected plant; in other 
20 embodiments, the present invention provides oil from the transfected plant. 

In yet other embodiments, the present invention provides a nucleic acid sequence 
or vector or composition as described above for use in transforming a plant or for use in 
altering a phenotype of a plant. In some embodiments, the present invention provides 
methods for making a transgenic plant comprising providing a nucleic acid sequence or 
25 vector or composition as described above, and plant tissue, and transfecting the plant tissue 
with the nucleic acid sequence or vector or composition under conditions such that a 
transgenic plant is generated. In other embodiments, the present invention further provides 
methods for altering a phenotype of a plant comprising providing a nucleic acid sequence or 
vector or composition as described above, and plant tissue, and transfecting the plant tissue 
30 with the nucleic acid sequence or vector or composition under conditions such that the 
phenotype is expressed. 

The present invention also provides methods for assaying acyl-CoA thioesterase 
activity comprising providing a nucleic acid sequence or vector or composition as described 
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above, expressing the nucleic acid sequence under conditions such that a protein is produced, 
and assaying the activity of the protein. 

The present invention also provides methods for producing variants of acyl-CoA 
thioesterases comprising providing a nucleic acid sequence as described above, 
5 mutagenizing the sequence, and screening a variant encoded by the mutagenized nucleic acid 
sequence for acyl-Co A thioesterase activity. 

The present invention further provides an isolated nucleic acid sequence encoding 
one or more plant acyl-CoA thioesterase motifs, wherein the motif is a cGMP binding 
domain; in some embodiments, the motif is SEQ ED NO:l 1 or SEQ ID NO: 12. 
10 The present invention also provides a first isolated nucleic acid encoding a plant acyl- 

CoA thioesterase, wherein the plant acyl-CoA thioesterase competes for binding to an acyl- 
CoA substrate with a protein encoded by a second nucleic acid sequence which is a nucleic 
• acid sequence as described above. 

The present invention also provides a composition comprising a first nucleic acid that 
1 5 inhibits the binding of at least a portion of a second nucleic acid to its complementary 
sequence, where the second nucleic acid sequence is any of the nucleic acid sequences 
described above. 

In other embodiments, the present invention provides a purified protein comprising 
an amino acid sequence encoded by any of the nucleic acid sequences described above or 
20 having any of SEQ ID NOs: 5-8, and portions thereof, as well as compositions comprising 
such a purified protein. 

The present invention also provides a compoimd of a nucleic acid sequence or a 
vector as described above substantially as described herein in any of the examples. 



25 

DESCRIPTION OF THE FIGURES 

Figure 1 shows the ACHl cDNA sequence (SEQ ID NO:l). 
Figure 2 shows the ACH2 cDNA sequence (SEQ ID NO:2). 
Figure 3 shows the ACH4 cDNA sequence (SEQ ID NO:3), 
30 Figure 4 shows the ACH5 cDNA sequence (SEQ ID NO:4). 

Figure 5, Panel A shows the ACHl peptide sequence (SEQ ID NO:5). The 
putative cGMP binding domain (SEQ ID NO:l 1) is boxed and indicated by highlighting. 
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Figure 5, Panel B shows the ACH2 peptide sequence (SEQ ID NO:6). The 
putative cGMP binding domain (SEQ ID NO: 12) is boxed and indicated by highlighting. 

Figure 6, Panel A shows the ACH4 peptide sequence (SEQ ID NO:7). 

Figure 6, Panel B shows the ACH5 peptide sequence (SEQ ID NO: 8). 
5 Figure 7 shows a histogram showing the activity of ACH2-MBP and MBP. 

Figure 8 shows a histogram of the activity of ACH2/6 His; panel A shows the 
activity in the presence of a constant amount of BSA, and panel B shows the activity in the 
presence of optimal concentrations of BA for each substrate. 

Figure 9 shows a histogram showing the activity of lysates from cells transformed 
10 with either pET24d/ACH5-6His or pT24d. 

DESCRIPTION OF THE INVENTION 

The present invention provides compositions and methods related to acyl- 
coenzyme A thioesterases. In particular, the present invention is related to plant acyl- 

15 coenzyme A thioesterases. The present invention encompasses both native and recombinant 
wild-type fomis of the enzymes, as well as mutant and variant forms, some of which possess 
altered characteristics relative to the wild-type enzyme. The present invention also relates to 
methods of using acyl-CoA thioesterases, including altered expression in transgenic plants 
and expression in prokaryotes and cell culture systems. After the "Definitions," the 

20 following Description ofthe Invention is divided into: 1. Acyl-Co A Thioesterases; and n. 
Uses of Acyl-CoA Thioesterase Nucleic Acids and Polypeptides. 

Definitions 

To facilitate understanding of the invention, a nimiber of terms are defined below. 

25 The term "plant" as used herein refers to a plurality of plant cells which are largely 

differentiated into a stmcture that is present at any stage of a plant's development. Such 
structures include, but are not limited to, a fiiiit, shoot, stem, leaf, flower petal, etc. The term 
"plant tissue" includes differentiated and undifferentiated tissues of plants including, but not 
limited to, roots, shoots, leaves, pollen, seeds, tumor tissue and various types of cells in 

30 culture (for example, single cells, protoplasts, embryos, callus, protocorm-like bodies, etc.). 
Plant tissue may be in planta, in organ cxilture, tissue culture, or cell culture. 

"Oil-producing species" herein refers to plant species which produce and store 
triacylglycerol in specific organs, primarily in seeds. Such species include soybean (Glycine 
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max), rapeseed and canola (including Brassica napus and campestris\ sunflower 
{Helianthus annus)^ cotton (Gossypium hirsutum\ com {Zea mays\ cocoa (Theobroma 
cacao), safflower (Carthamus tiiictorius), oil palm (Elaeis guineensis), coconut palm (Cocos 
nucifera), flax (Linum usitatissimum), castor (Ricinus communis) and peanut (Arachis 
5 hypogaea). The group also includes non-agronomic species which are useful in developing 
appropriate expression vectors such as tobacco, rapid cycling Brassica species, and 
Arabidopsis thaliana, and wild species which may be a som*ce of unique fatty acids. 

As used herein, the terms "acyl-CoA thioesterase" and "acyl-CoA hydrolase" 
(ACH) are used interchangeably to refer to an enzymatic activity that catalyzes the 

10 hydrolysis of acyl-CoA, resulting in the formation of free fatty acid and reduced coenzyme A 
(CoA). The terms "plant acyl-CoA thioesterase" or "plant acyl-CoA hydrolase" refer to an 
acyl-CoA thioesterase derived from a plant. These temis encompass both acyl-CoA 
thioesterases that are identical to wild type plant acyl-CoA thioesterases and those that are 
derived from wild type plant acyl-CoA thioesterases (e.g., variants of plant acyl-CoA 

15 thioesterases or chimeric genes constructed with portions of plant acyl-CoA thioesterase 
coding regions). 

As used herein, the term "acyl-CoA sjmthetase (ACS)" refers to an enzymatic 
activity that catalyzes the formation of an acyl-CoA from a free fatty acid and coenzyme A 
(CoA). As used herein, the term "plant acyl-CoA synthetase" refers to an acyl-CoA 

20 synthetase derived from a plant. 

The term "gene" as used herein, refers to a DNA sequence that comprises control 
and coding sequences necessary for the production of a polypeptide or protein precursor. 
The polypeptide can be encoded by a frill length coding sequence or by any portion of the 
coding sequence, as long as the desired protein activity is retained. 

25 " "Nucleoside," as used herein, refers to a compoimd consisting of a purine [guanine 

(G) or adenine (A)] or pyrimidine [thymine (T), uridine (U), or cytidine (C)] base covalently 
linked to a pentose, whereas "nucleotide" refers to a nucleoside phosphorylated at one of its 
pentose hydroxyl groups. 

A "nucleic acid," as used herein, is a covalently linked sequence of nucleotides in 

30 which the 3' position of the pentose of one nucleotide is joined by a phosphodiester group to 
the 5* position of the pentose of the next, and in which the nucleotide residues (bases) are 
linked in specific sequence (in other words, a linear order of nucleotides). A 
"polynucleotide," as used herein, is a nucleic acid containing a sequence that is greater than 
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about 100 nucleotides in length. An "oligonucleotide," as used herein, is a short 
polynucleotide or a portion of a polynucleotide. An oligonucleotide typically contains a 
sequence of about two to about one hundred bases. The word "oligo" is sometimes used in 
place of the word "oligonucleotide". 
5 Nucleic acid molecules are said to have a "5 -terminus" (5* end) and a "3 - 

terminus" (3* end) because nucleic acid phosphodiester linkages occur to the 5' carbon and 3' 
carbon of the pentose ring of the substituent mononucleotides. The end of a nucleic acid at 
which a new linkage would be to a 5* carbon is its 5' terminal nucleotide. The end of a 
nucleic acid at which a new linkage would be to a 3' carbon is its 3' terminal nucleotide. A 
10 terminal nucleotide, as used herein, is the nucleotide at the end position of the 3 - or 5'- 
terminus. 

DNA molecules are said to have "5* ends" and "3' ends" because mononucleotides 
are reacted to make oligonucleotides in a maimer such that the 5* phosphate of one 
mononucleotide pentose ring is attached to the 3' oxygen of its neighbor in one direction via 
15 a phosphodiester linkage. Therefore, an end of an oligonucleotides referred to as the "5' end" 
if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the 
"3* end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide 
pentose ring. 

'^Nucleic acid sequence" as used herein refers to an oligonucleotide, nucleotide or 
20 polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or 
synthetic origin which may be single- or double-stranded, and represent the sense or 
antisense strand. Similarly, "amino acid sequence" as used herein refers to a peptide or 
protein sequence. "Peptide nucleic acid" as used herein refers to an oligomeric molecule in 
which nucleosides are joined by peptide, rather than phosphodiester, linkages. These small 
25 molecules, also designated anti-gene agents, stop transcript elongation by binding to their 
complementary (template) strand of nucleic acid (Nielsen et aL (1993) Anticancer Drug 
Des., 8:53-63). 

A "deletion" is defined as a change in either nucleotide or amino acid sequence in 
which one or more nucleotides or amino acid residues, respectively, are absent. 
30 An "insertion" or "addition" is that change in a nucleotide or amino acid sequence 

which has resulted in the addition of one or more nucleotides or amino acid residues, 
respectively, as compared to, naturally occurring sequences. 
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A "substitution" results from the replacement of one or more nucleotides or amino 
acids by different nucleotides or amino acids, respectively. 

As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide 
or polynucleotide, also may be said to have 5* and 3' ends. In either a linear or circular DNA 
5 molecule, discrete elements are referred to as being "upstream" or 5' of the "downstream" or 
3* elements. This terminology reflects the fact that transcription proceeds in a 5' to 3* fashion 
along the DNA strand. Typically, promoter and enhancer elements that direct transcription 
of a linked gene are generally located 5* or upstream of the coding region. However, 
enhancer elements can exert their effect even when located 3' of the promoter element and 

10 the coding region. Transcription termination and polyadenylation signals are located 3* or 
downstream of the coding region. 

The term "wild-type" when made in reference to a gene refers to a gene which has 
the characteristics of a gene isolated from a naturally occiuring source. The term "wild-type" 
when made in reference to a gene product refers to a gene product which has the 

15 characteristics of a gene product isolated from a naturally occurring soiurce. A wild-type 
gene is that which is most frequently observed in a population and is thus arbitrarily 
designated the "normal" or "wild-type" form of the gene. In contrast, the term "modified" or 
"mutant" when made in reference to a gene or to a gene product refers, respectively, to a 
gene or to a gene product which displays modifications in sequence and/or ftmctional 

20 properties (in other words, altered characteristics) when compared to the wild-type gene or 
gene product. It is noted that naturally-occurring mutants can be isolated; these are 
identified by the fact that they have altered characteristics when compared to the wild-type 
gene or gene product. 

The term "antisense" as used herein refers to a deoxyribonucleotide sequence 

25 whose sequence of deoxyribonucleotide residues is in reverse 5* to 3' orientation in relation 
to the sequence of deoxyribonucleotide residues in a sense strand of a DNA duplex. A 
"sense strand" of a DNA duplex refers to a strand in a DNA duplex which is transcribed by a 
cell in its natural state into a "sense mRNA." Thus an "antisense" sequence is a sequence 
having the same sequence as the non-coding strand in a DNA duplex. The term "antisense 

30 RNA" refers to a RNA transcript that is complementary to all or part of a target primary 
transcript or mKNA and that blocks the expression of a target gene by interfering with the 
processing, transport and/or translation of its primary transcript or mRNA, The 
coniplementarity of an antisense RNA may be with any part of the specific gene transcript 
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(in other words, at the 5' non-codmg sequence, 3' non-coding sequence, introns, or the 
coding sequence). Once an antisense RNA is introduced into a cell, this transcribed strand 
combines with natural mRNA produced by the cell to form duplexes. These duplexes then 
block eitiier the further transcription of the mRNA or its translation. In this manner, mutant 
5 phenotypes may be generated. The term "antisense strand" is used in reference to a nucleic 
acid strand that is complementary to the "sense" strand. The designation (-) (in other words, 
"negative") is sometimes used in reference to the antisense strand, with the designation (-*-) 
sometimes used in reference to the sense (in other words, "positive") strand. In addition, as 
used herein, antisense RNA may contain regions of ribo2yme sequences that increase the 

10 efficacy of antisense RNA to block gene expression. "Ribozyme" refers to a catalytic RNA 
and includes sequence-specific endoribonucleases. "Antisense inhibition" refers to the 
production of antisense RNA transcripts capable of preventing the expression of the target 
protein. Antisense DNA or RNA may be produced by any method, including synthesis by 
splicing the gene(s) of interest in a reverse orientation to a viral promoter which permits the 

1 5 synthesis of a coding strand. 

As used herein, the term "overexpression" refers to the production of a gene 
product in transgenic organisms that exceeds levels of production in normal or non- 
transformed organisms. As used herein, the term "cosuppression" refers to the expression of 
a foreign gene which has substantial homology to an endogenous gene resulting in the 

20 suppression of expression of both the foreign and the endogenous gene. As used herein, the 
term "altered levels" refers to the production of gene product(s) in transgenic organisms in 
amoxmts or proportions that differ from that of normal or non-transformed organisms. 

The term "recombinant" when made in reference to a DNA molecule refers to a 
DNA molecule which is comprised of segments of DNA joined together by means of 

25 molecular biological techniques. The term "recombinant" when made in reference to a 

protein or a polypeptide refers to a protein molecule which is expressed using a recombinant 
DNA molecule. 

As used herein, the terms "restriction endonucleases" and "restriction enzymes" 
refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific 
30 nucleotide sequence. 

The term "nucleotide sequence of interest" refers to any nucleotide sequence, the 
manipulation of which may be deemed desirable for any reason (for example, confer 
improved qualities), by one of ordinary skill in the art. Such nucleotide sequences include. 
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but are not limited to, coding sequences of structural genes (for example, reporter genes, 
selection marker genes, oncogenes, drug resistance genes, growth factors, etc.\ and non- 
coding regulatory sequences which do not encode an mRNA or protein product, (for 
example, promoter sequence, polyadenylation sequence, termination sequence, enhancer 
5 sequence, etc.). 

As used herein the term "coding region" when used in reference to structural gene 
refers to the nucleotide sequences which encode the amino acids foimd in the nascent 
polypeptide as a result of translation of a mRNA molecule. Typically, the coding region is 
boxmded on the 5* side by the nucleotide triplet " ATG" which encodes the initiator 

10 methionine and on the 3' side by a stop codon (e,g.y TAA, TAG, TGA). In some cases the 
coding region is also known to initiate by a nucleotide triplet "TTG". 

As used herein, the terms "complementary" or "complementarity" when used in 
reference to polynucleotides refer to polynucleotides which are related by the base-pairing 
rules. For example, for the sequence 5'-AGT-3' is complementary to the sequence 5 -ACT- 

15 3\ Complementarity may be "partial," in which only some of the nucleic acids' bases are 
matched according to the base pairing rules. Or, there may be "complete" or "total" 
complementarity between the nucleic acids. The degree of complementarity between nucleic 
acid strands has significant effects on the efficiency and strength of hybridization between 
nucleic acid strands. This is of particular importance in amplification reactions, as well as 

20 detection methods which depend upon binding between nucleic acids. 

A "complement" of a nucleic acid sequence as used herein refers to a nucleotide 
sequence whose nucleic acids show total complementarity to the nucleic acids of the nucleic 
acid sequence. 

The term "homology" when used in relation to nucleic acids refers to a degree of 
25 complementarity. There may be partial homology or complete homology (in other words, 
identity). "Sequence identity" refers to a measure of relatedness between two or more 
nucleic acids or proteins, and is given as a percentage with reference to the total comparison 
length. The identity calculation takes inlo account those nucleotide or amino acid residues 
that are identical and in the same relative positions in their respective larger sequences. 
30 Calculations of identity may be performed by algorithms contained within computer 
programs such as "GAP" (Genetics Computer Group, Madison, Wis.) and "ALIGN" 
(DNAStar, Madison, Wis.). A partially complementary sequence is one that at least partially 
inhibits a completely complementary sequence firom hybridizing to a target nucleic acid is 
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referred to using the functional term "substantially homologous," The inhibition of 
hybridization of the completely complementary sequence to the target sequence may be 
examined using a hybridization assay (Southern or Northern blot, solution hybridization and 
the like) under conditions of low stringency. A substantially homologous sequence or probe 
5 will compete for and inhibit the binding (in other words, the hybridization) of a sequence 
which is completely homologous to a target under conditions of low stringency. This is not 
to say that conditions of low stringency are such that non-specific binding is permitted; low 
stringency conditions require that the binding of two sequences to one another be a specific 
(in other words, selective) interaction. The absence of non-specific binding may be tested by 

1 0 the use of a second target which lacks even a partial degree of complementarity (for 

example, less than about 30 percent identity); in the absence of non-specific binding the 
probe will not hybridize to the second non-complementary target. 

When used in reference to a double-stranded nucleic acid sequence such as a 
cDNA or genomic clone, the term "substantially homologous" refers to any probe which can 

1 5 hybridize to either or both strands of the double-stranded nucleic acid sequence under 
conditions of low stringency as described herein. 

The art knows well that numerous equivalent conditions may be employed to 
comprise either low or high stringency conditions; factors such as the length and nature 
(DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base 

20 composition, present in solution or immobilized, etc.) and the concentration of the salts and 
other components (e.g.^ the presence or absence of formamide, dextran sulfate, polyethylene 
glycol) are considered and the hybridization solution may be varied to generate conditions of 
either low or high stringency hybridization different firom, but equivalent to, the above listed 
conditions. The term "hybridization" as used herein includes "any process by which a strand 

25 of nucleic acid joins with a complementary strand through base pairing" (Coombs (1994) 
Dictionary of Biotechnology ^ Stockton Press, New York NY). 

As used herein, the term "Tm" is used in reference to the "melting temperature," 
The melting temperature is the temperature at which a population of double-stranded nucleic 
acid molecules becomes half dissociated into single strands. The equation for calculating the 

30 Tm of nucleic acids is well known in the art. As indicated by standard references, a simple 
estimate of the Tm value may be calculated by the equation: Tm = 81.5 + 0.4 1( percent G + 
C), when a nucleic acid is in aqueous solution at 1 M NaCl (See for example, Anderson and 
Young (1985) Quantitative Filter Hybridisation, in Nucleic Acid Hybridisation). Other 

-10- 
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references include more sophisticated computations which take structural as well as 
sequence characteristics into account for the calculation of Tm. 

"Stringency" typically occurs in a range from ahout Tm"5^C (5*^0 below the Tm of 
the probe) to about 20°C to 25''C below Tm. As will be understood by those of skill in the 
5 art, a stringent hybridization can be used to identify or detect identical polynucleotide 

sequences or to identify or detect similar or related polynucleotide sequences. "Stringency" 
when used in reference to nucleic acid hybridization typically occurs in a range from about 
Tni-5°C (5°C below the Tm of the probe) to about 20°C to 25°C below Tm- 

Low stringency conditions when used in reference to nucleic acid hybridization 

10 comprise conditions equivalent to binding or hybridization at 42^C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6,9 g/1 NaH2P04*H20 and 1.85 g/1 EDTA, pH adjusted to 7.4 with 
NaOH), 0.1 percent SDS, 5X Denhardt's reagent [SOX Denhardt's contains per 500 ml: 5 g 
FicoU (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 _g/ml denatured 
sahnon sperm DNA followed by washing in a solution comprising 5X SSPE, 0.1 percent 

15 SDS at 42°C when a probe of about 500 nucleotides in length is employed. 

High stringency conditions when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 
5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaHaPOA-HiO and 1.85 g/1 EDTA, pH adjusted to 7,4 with 
NaOH), 0.5 percent SDS, 5X Denhardt's reagent and 100 g/ml denatured salmon sperm 

20 DNA followed by washing m a solution comprising 0. IX SSPE, 1 .0 percent SDS at 42°C 
when a probe of about 500 nucleotides in length is employed. 

When used in reference to nucleic acid hybridization the art knows well that 
nmnerous equivalent conditions may be employed to comprise either low or high stringency 
conditions; factors such as the length and nature (DNA, KNA, base composition) of the 

25 probe and nature of the target (DNA, RNA, base composition, present in solution or 

immobilized, etc.) and the concentration of the salts and other components (for example, the 
presence or absence of formanaide, dextran sulfate, polyethylene glycol) are considered and 
the hybridization solution may be varied to generate conditions of either low or high 
stringency hybridization different from, but equivalent to, the above Usted conditions. 

30 As used herein the term "hybridization complex" refers to a complex formed 

between two nucleic acid sequences by virtue of the formation of hydrogen boxmds between 
complementary G and C bases and between complementary A and T bases; these hydrogen 
bonds may be further stabilized by base stacking interactions. The two complementary 

-11- 
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nucleic acid sequences hydrogen bond in an antiparallel configuration, A hybridization 
complex may be formed in solution {e,g.. Cot or Ro^ analysis) or between one nucleic acid 
sequence present in solution and another nucleic acid sequence immobilized to a solid 
support (e.g., a nylon membrane or a nitrocellulose filter as employed in Southem and 
5 Northern blotting, dot blotting or a glass slide as employed in in situ hybridization, including 
FISH [fluorescent in situ hybridization]). 

"Alternations in the polynucleotide" as used herein comprise any alteration in the 
sequence of polynucleotides encoding histidine kinase, including deletions, insertions, and 
point mutations that may be detected using hybridization assays. Included within this 

10 definition is the detection of alterations to the genomic DNA sequence which encodes 

histidine kinase (eg^., by alterations in pattem of restriction enzyme fragments capable of 
hybridizing to any sequence such as SEQ ID NOS:l-4, for example, RFLP analysis, the 
inability of a selected fragment of any sequence to hybridize to a sample of genomic DNA, 
e.g,, using allele-specific oligonucleotide probes, improper or unexpected hybridization, such 

15 as hybridization to a locus other than the normal chromosomal locus for the histidine kinase 
gene, e.g., using FISH to metaphase chromosomes spreads, etc.). 

The term "derivative" as used herein refers to the chemical modification of a 
nucleic acid encoding acyl-CoA structures. Illustrative of such modifications would be 
replacement of hydrogen by an alkyl, acyl, or amino group. A nucleic acid derivative would 

20 encode a polypeptide which retains essential biological characteristics of naturally-occurring 
acyl-CoA thioesterase. 

A *Variant" in regard to amino acid sequences is used to indicate an amino acid 
sequence that differs by one or more amino acids from another, usually related amino acid. 
The variant may have "conservative" changes, wherein a substituted amino acid has similar 

25 stractural or chemical properties (e.g*., replacement of leucine with isoleucine). More rarely, 
a variant may have "non-conservative" changes, e.g., replacement of a glycine with a 
tryptophan. Similar minor variations may also include amino acid deletions or insertions 

additions), or both. Guidance in determining which and how many amiao acid residues 
may be substituted, inserted or deleted without abolishing biological or immunological 

30 activity may be found using computer programs well known in the art, for example, 

DNAStar software. Thus, it is contemplated that this definition will encompass variants of 
acyl-CoA thioesterases. Such variants can be tested in fimctional assays, such as growth 
inhibition assays. 
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As used herein the term "portion" in reference to an amino acid sequence or a 
protein (as in "a portion of an amino acid sequence") refers to fragments of that protein. The 
fragments may range in size from four amino acid residues to the entire amino acid sequence 
minus one amino acid. Thus, a protein "comprising at least a portion of an amino acid 
5 sequence encoding an acyl-CoA thioesterase," encompasses the full-length acyl-CoA 
thioesterase and fragments thereof. 

Polypeptide molecules are said to have an "amino terminus" (N-terminus) and a 
"carboxy terminus" (C-terminus) because peptide linkages occur between the backbone 
amino group of a first amino acid residue and the backbone carboxyl group of a second 

10 amino acid residue. Typically, the terminus of a polypeptide at which a new linkage would 
be to the carboxy-terminus of the growing polypeptide chain, and polypeptide sequences are 
written from left to right beginning at the amino terminus. 

As used herein, the term "host cell" refers to any cell capable of expressing a 
ftinctional gene and/or gene product introduced from another cell or organism. This 

1 5 definition includes, but is not limited to E. coli and other cells used as expression vectors to 
produce acyl-CoA thioesterase, in particular plant acyl-CoA thioesterases. 

As used herein, the term "fusion protein" refers to a chimeric protein containing 
the protein of interest (for example, ACHs and fragments thereof) joined to an exogenous 
protein fragment (for example, the fusion partner which consists of a non-ACH protein). 

20 The fusion partner may enhance the solubility of ACH protein as expressed in a host cell, 
may provide an affinity tag to allow purification of the recombinant fusion protein from the 
host cell or culture supernatant, or both. If desired, the fusion protein may be removed from 
the protein of interest (for example, ACH or fragments thereof) by a variety of enzymatic or 
chemical means know to the art. 

25 - - As used herein, the term "transit peptide" refers to the N-terminal extension of a 

protein that serves as a signal for uptake and transport of that protein into an organelle such 
as a plastid or mitochondrion. 

The term "isolated" when used in relation to a nucleic or amino acid, as in "an 
isolated nucleic acid sequence" or "an isolated amino acid sequence" refers to a nucleic acid 

30 sequence or amino acid that is identified and separated from at least one contaminant nucleic 
acid or amino acid with which it is ordinarily associated in its natural source. Isolated 
nucleic acid is nucleic acid present in a form or setting that is different from that in which it 
is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and 
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RNA which are found in the state they exist in nature. For example, a given DNA sequence 
(for example, a gene) is found on the host cell chromosome in proximity to neighboring 
genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are 
found in the cell as a mixture with nimierous other rtiRNAs which encode a multitude of 
5 proteins. However, an isolated nucleic acid sequence comprising SEQ ID NO:l includes, by 
way of example, such nucleic acid sequences in cells which ordinarily contain SEQ ID NO:l 
where the nucleic acid sequence is in a chromosomal or extrachromosomal location different 
from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than 
that found in nature. The isolated nucleic acid sequence may be present in single-stranded or 

10 double-stranded form. When an isolated nucleic acid sequence is to be utiUzed to express a 
protein, the nucleic acid sequence will contain at a minimum at least a portion of the sense or 
coding strand (in other words, the nucleic acid sequence may be single-stranded). 
Alternatively, it may contain both the sense and anti-sense strands (in other words, the 
nucleic acid sequence may be double-stranded). 

15 As used herein, the term "purified" refers to molecules, either nucleic or amino 

acid sequences, that are removed from their natural enviromnent, isolated or separated. An 
"isolated nucleic acid sequence" is therefore a purified nucleic acid sequence. "Substantially 
purified" molecules are at least 60 percent free, preferably at least 75 percent free, and more 
preferably at least 90 percent free from other components with which they are naturally 

20 associated. 

As used herein, the terms "vector" and "vehicle" are used interchangeably in 
reference to nucleic acid molecules that transfer DNA segment(s) from one cell to another. 
Vectors may include plasmids, bacteriophages, vimses, cosmids, and the like. 

The term "expression vector" or "expression cassette" as used herein refers to a 
25 recombinant DNA molecule containing a desired coding sequence and appropriate nucleic 
acid sequences necessary for the expression of the operably linked coding sequence in a 
particular host organism. Nucleic acid sequences necessary for expression in prokaryotes 
usually include a promoter, an operator (optional), and a ribosome binding site, often along 
with other sequences, Eukaiyotic cells are known to utilize promoters, enhancers, and 
30 termination and polyadenylation signals. 

The terms "targeting vector" or "targeting constmct" refer to oligonucleotide 
sequences comprising a gene of interest flanked on either side by a recognition sequence 
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which is capable of homologous recombination of the DNA sequence located between the 
flanking recognition sequences. 

The terms "in operable combination," "in operable order," and "operably linked" 
as used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic 
5 acid molecule capable of directing the transcription of a given gene and/or the sjmthesis of a 
desired protein molecule is produced. The term also refers to the linkage of amino acid 
sequences in such a manner so that a functional protein is produced. 

The term "selectable marker" as used herein, refer to a gene which encodes an 
en2yme having an activity that confers resistance to an antibiotic or drug upon the cell in 

10 which the selectable marker is expressed. Selectable markers may be "positive" or 

"negative." Examples of positive selectable markers include the neomycin phosphotrasferase 
(NPTIQ gene which confers resistance to G418 and to kanamycin, and the bacterial 
hygromycin phosphotransferase gene Qtyg), which confers resistance to the antibiotic 
hygromycin. Negative selectable markers encode an enzymatic activity whose expression is 

15 C34otoxic to the cell when grown in an appropriate selective medium. For example, the 

HSV-rA: gene is commonly used as a negative selectable marker. Expression of the HS V-/A: 
gene in cells grown in the presence of gancyclovir or acyclovir is cytotoxic; thus, growth of 
cells in selective medium containing gancyclovir or acyclovir selects against cells capable of 
expressing a functional HSV TK enzyme. 

20 Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" 

elements. Promoters and enhancers consist of short arrays of DNA sequences that interact 
specifically with cellular proteins involved in transcription (Maniatis, et al^ Science 
236:1237, 1987). Promoter and enhancer elements have been isolated from a variety of 
, eukaryotic sources including genes in yeast, insect, mammalian and plant cells. Promoter 

25 and enhancer elements have also been isolated from viruses and analogous control elements^ 
such as promoters, are also found in prokaryotes. The selection of a particular promoter and 
enhancer depends on the cell type used to express the protein of interest. Some eukaryotic 
promoters and enhancers have a broad host range while others are functional in a limited 
subset of cell types (for review, see Voss, et aL, Trends Biochem. Sci., 1 1:287, 1986; and 

30 Maniatis, a/., ^-Mj^ra 1987). 

The terms "promoter element," "promoter," or "promoter sequence" as used 
herein, refer to a DNA sequence that is located at the 5' end (in other words precedes) the 
protein coding region of a DNA polymer. The location of most promoters known in nature 
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precedes the transcribed region* The promoter functions as a switch, activating the 
expression of a gene. If the gene is activated, it is said to be transcribed, or participating in 
transcription. Transcription involves the synthesis of mRNA from the gene. The promoter, 
therefore, serves as a transcriptional regulatory element and also provides a site for initiation 
5 of transcription of the gene into niRNA. 

Promoters may be tissue specific or cell specific. The term "tissue specific" as it 
applies to a promoter refers to a promoter that is capable of directing selective expression of 
a nucleotide sequence of interest to a specific type of tissue (for example, seeds) in the 
relative absence of expression of the same nucleotide sequence of interest in a different type 

10 of tissue (for example, leaves). Tissue specificity of a promoter may be evaluated by, for 
example, operably linking a reporter gene to the promoter sequence to generate a reporter 
construct, introducing the reporter construct into the genome of a plant such that the reporter 
construct is integrated into every tissue of the resulting transgenic plant, and detecting the 
expression of the reporter gene (for example, detecting mRNA, protein, or the activity of a 

1 5 proteia encoded by the reporter gene) in different tissues of the transgenic plant. The 

detection of a greater level of expression of the reporter gene in one or more tissues relative 
to the level of expression of the reporter gene in other tissues shows that the promoter is 
specific for the tissues in which greater levels of expression are detected. The term "cell type 
specific" as applied to a promoter refers to a promoter which is capable of directing selective 

20 expression of a nucleotide sequence of interest in a specific type of cell in the relative 

absence of expression of the same nucleotide sequence of interest in a different type of cell 
within the same tissue. The term "cell type specific" when applied to a promoter also means 
a promoter capable of promoting selective expression of a nucleotide sequence of interest in 
a region within a single tissue. Cell type specificity of a promoter may be assessed using 

25 methods well known in the art, for example, immunohistochemical staining. Briefly, tissue 
sections are embedded in paraffin, and paraffin sections are reacted with a primary antibody 
which is specific for the polypeptide product encoded by the nucleotide sequence of interest 
whose expression is controlled by the promoter. A labeled (for example, peroxidase 
conjugated) secondary antibody which is specific for the primary antibody is allowed to bind 

30 to the sectioned tissue and specific binding detected (for example, with avidin/biotin) by 
microscopy. 

Promoters may be constitutive or regulatable. The term "constitutive" when made 
in reference to a promoter means that the promoter is capable of directing transcription of an 
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operably linked nucleic acid sequence in the absence of a stimulus (for example, heat shock, 
chemicals, light, etc.). Typically, constitutive promoters are capable of directing expression 
of a transgene in substantially any cell and any tissue. Exemplary constitutive plant 
promoters include, but are not limited to 35S Cauliflower Mosaic Virus (CaMV 35S; see for 
5 example^ U.S. Pat. No. 5,352,605), mannopine sjmthase, octopine synthase (ocs), 

superpromoter {see for example^ WO 95/14098), and ubi3 {see for example, Garbarino and 
Belknap (1994) Plant Mol. Biol. 24:1 19-127) promoters. Such promoters have been used 
successfully to direct the expression of heterologous nucleic acid sequences in transformed 
plant tissue. 

10 In contrast, a "regulatable" promoter is one which is capable of directing a level of 

franscription of an operably linked nuclei acid sequence in the presence of a stimulus (for 
example, heat shock, chemicals, light, etc.) which is different from the level of transcription 
of the operably linked nucleic acid sequence in the absence of the stimulus. 

As used herein, the term "regulatory element" refers to a genetic element that 

15 controls some aspect of the expression of nucleic acid sequence(s). For example, a promoter 
is a regulatory element that facilitates the initiation of transcription of an operably linked 
coding region. Other regulatory elements include splicing signals, polyadenylation signals, 
termination signals, etc. 

The enhancer and/or promoter may be "endogenous" or "exogenous" or 

20 "heterologous." An "endogenous" enhancer or promoter is one that is naturally linked with a 
given gene in the genome. An "exogenous" or "heterologous" enhancer or promoter is one 
that is placed in juxtaposition to a gene by means of genetic manipulation (in other words, 
molecular biological techniques) such that transcription of the gene is directed by the linked 
enhancer or promoter. 

25 The presence of "sphcing signals" on an expression vector often results in higher 

levels of expression of the recombinant transcript in eukaryotic host cells. Splicing signals 
mediate the removal of introns from the primary RNA transcript and consist of a splice 
donor and acceptor site (Sambrook, et al (1989) Molecular Cloning: A Laboratory Manual, 
2nd ed. (Cold Spring Harbor Laboratory Press, New York) pp. 16.7-16.8). A conamonly 

30 used splice donor and acceptor site is the splice junction from the 16S RNA of SV40. 

Efficient expression of recombinant DNA sequences in eukaryotic cells requires 
expression of signals directing the efficient tennination and polyadenylation of the resulting 
transcript. Transcription termination signals are generally found downstream of the 

-17- 



wo 02/08433 PCT/USO 1/22907 

polyadenylation signal and are a few hundred nucleotides in length. The term "poly(A) site" 
or "poly(A) sequence" as used herein denotes a DNA sequence which directs both the 
termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of 
the recombinant transcript is desirable, as transcripts lacking a poly(A) tail are unstable and 
5 are rapidly degraded. The poly(A) signal utilized in an expression vector may be 
"heterologous" or "endogenous." An endogenous poly(A) signal is one that is found 
naturally at the 3* end of the coding region of a given gene in the genome. A heterologous 
poly(A) signal is one which has been isolated from one gene and positioned 3' to another 
gene. A commonly used heterologous poly(A) signal is the SV40 poly(A) signal. The SV40 

10 poly(A) signal is contained on a 237 bp BamHUBcK restriction fragment and directs both 
termination and polyadenylation (Sambrook, supra^ at 16,6-16.7), 

The terms "infecting" and "infection" with a bacterium refer to co-incubation of a 
target biological sample, (for example, cell, tissue, etc) with the bacterixun under conditions 
such that nucleic acid sequences contained within the bacteriimi are introduced into one or 

1 5 more cells of the target biological sample. 

The term "Agrobacterium" refers to a soil-home. Gram-negative, rod-shaped 
phytopathogenic bacterium which causes crown gall. The term "Agrobacterium" includes, 
but is not limited to, the strains Agrobacterium tumefaciens^ (which typically causes crown 
gall in infected plants), and Agrobacterium rhizogens (which causes hairy root disease in 

20 infected host plants). Infection of a plant cell with Agrobacterium generally results in the 
production of opines (for example, nopaline, agropine, octopine etc.) by the infected cell. 
Thus, Agrobacterium strains which cause production of nopaline (for example, strain 
LBA4301, C58, A208) are referred to as "nopaline-type" Agrobacteria; Agrobacterium 
strains which cause production of octopine (for example, strain LB A4404, Ach5, B6) are 

25 referred to as "octopine-type" Agrobacteria; and Agrobacterium strains which cause 

production of agropine (e.g.^ strain EHA105, EHAlOl, A281) are referred to as "agropine- 
type" Agrobacteria, 

The terms "bombarding, "bombardment," and "biolistic bombardment" refer to the 
process of accelerating particles towards a target biological sample (for example, cell, tissue, 
30 etc) to effect woimding of the cell membrane of a cell in the target biological sample and/or 
entry of the particles into the target biological sample. Methods for biolistic bombardment 
are known in the art (for example, U.S. Patent No. 5,584,807), and ayre commercially 
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available (for example, the helium gas-driven microprojectile accelerator (PDS-lOOO/He, 
BioRad). 

The term "microwounding" when made in reference to plant tissue refers to the 
introduction of microscopic wounds in that tissue. Microwounding may be achieved by, for 
5 example, particle bombardment as described herein. 

The term "transfection" as used herein refers to the introduction of foreign DNA 
into eukaryotic cells. Transfection may be accomplished by a variety of means known to the 
art including calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated 
transfection, polybrene-mediated transfection, electroporation, microinjection, Uposome 
, 10 fusion, lipofection, protoplast fusion, retroviral infection, and bioUstics. 

^' The term "transgenic" when used in reference to a cell refers to a cell which 

contains a transgene, or whose genome has been altered by the introduction of a transgene. 
The term "transgenic" when used in reference to a tissue or to a plant refers to a tissue or 
plant, respectively, which comprises one or more cells that contain a transgene, or whose 
15 genome has been altered by the introduction of a transgene. Transgenic cells, tissues and 
plants may be produced by several methods including the introduction of a "transgene" 
comprising nucleic acid (usually DNA) into a target cell or integration of the transgene into a 
chromosome of a target cell by way of human intervention, such as by the methods described 
herein. 

20 The term "transgene" as used herein refers to any nucleic acid sequence which is 

introduced into the genome of a cell by experimental manipulations. A transgene may be an 
"endogenous DNA sequence," or a "heterologous DNA sequence" (in other words, "foreign 
DNA"). The term "endogenous DNA sequence" refers to a nucleotide sequence which is 
naturally found in the cell into which it is introduced so long as it does not contain some 

25 niodification (for example, a point mutation, the presence of a selectable marker gene, etc) 
relative to the naturally-occurring sequence. The term "heterologous DNA sequence" refers 
to a nucleotide sequence which is ligated to, or is manipulated to become Ugated to, a nucleic 
acid sequence to which it is not ligated in nature, or to which it is ligated at a different 
location in nature. Heterologous DNA is not endog^ous to the cell into which it is 

30 introduced, but has been obtained from another cell. Heterologous DNA also includes an 
endogenous DNA sequence which contains some modification. Generally, although not 
necessarily, heterologous DNA encodes RNA and proteins that are not normally produced by 
the cell into which it is expressed. Examples of heterologous DNA include reporter genes. 
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transcriptional and translational regulatory sequences, selectable marker proteins (for 
example, proteins which confer drug resistance), etc. 

The term "foreign gene" refers to any nucleic acid (for example, gene sequence) 
which is introduced into the genome of a cell by experimental manipulations and may 
5 include gene sequences found in that cell so long as the introduced gene contains some 
modification (for example, a point mutation, the presence of a selectable marker gene, etc) 
relative to the naturally-occurring gene. 

The term "transformation" as used herein refers to the introduction of a transgene 
into a cell. Transformation of a cell may be stable or transient. The term "transient 

10 transformation" or "transiently transformed" refers to the introduction of one or more 
transgenes into a cell in the absence of integration of the transgene into the host cell's 
genome. Transient transformation may be detected by, for example, enzyme-linked 
immunosorbent assay (ELISA) which detects the presence of a polypeptide encoded by one 
or more of the transgenes. Alternatively, transient transformation may be detected by 

15 detecting the activity of the protein (for example, _-glucuronidase) encoded by the transgene. 
The term "transient transformant" refers to a cell which has transiently incorporated one or 
more transgenes. In contrast, the term "stable transformation" or "stably transformed"; refers 
to the introduction and integration of one or more transgenes into the genome of a cell; 
Stable transformation of a cell may be detected by Southem blot hybridization of genomic 

20 DNA of the cell with nucleic acid sequences which are capable of binding to one or more of 
the transgenes. Altematively, stable transformation of a cell may also be detected by the 
polymerase chain reaction of genomic DNA of the cell to amplify transgene sequences. The 
term "stable transformant" refers to a cell which has stably integrated one or more transgenes 
into the genomic DNA. Thus, a stable transformant is distinguished from a transient 

25 transformant in that, whereas genomic DNA from the stable transformant contains one or 
more transgenes, genomic DNA from the transient transformant does not contain a 
transgene. 

The term "amplification" is defined as the production of additional copies of a 
nucleic acid sequence and is generally carried out using polymerase chain reaction 
30 technologies well known in the art (Diefifenbach and GS Dvekler (1995) PCR Primer, a 
Laboratory Manual, Cold Spring Harbor Press, Plainview NY). As used herein, the term 
"polymerase chain reaction" ("PCR") refers to the methods disclosed in U.S. Patent Nos. 
4,683,195, 4,683,202 and 4,965,188, which describe a method for increasing the 
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concentration of a segment of a target sequence in a mixture of genomic DNA without 
cloning or purification. This process for amplifying the target sequence consists of 
introducing a large excess of two oligonucleotide primers to the DNA mixture containing the 
desired target sequence, followed by a precise sequence of thermal cycling in the presence of 
5 a DNA polymerase. The two primers are complementary to their respective strands of the 
double stranded target sequence. To effect amplification, the mixture is denatured and the 
primers then aimealed to their complementary sequences , within the target molecule. 
Following annealing, the primers are extended with a polymerase so as to form a new pair of 
complementary strands. The steps of denaturation, piimer annealing and polymerase 

10 extension can be repeated many times (in other words, denaturation, annealing and extension 
constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an 
ampUfied segment of the desired target sequence. The length of the amplified segment of the 
desired target sequence is determined by the relative positions of the primers with respect to 
each other, and therefore, this length is a controllable parameter. By virtue of the repeating 

1 5 aspect of the process, the method is referred to as the "polymerase chain reaction" 

(hereinafter "PGR")* Because the desired amplified segments of the target sequence become 
the predominant sequences (in terms of concentration) in the mixture, they are said to be 
"PGR amplified." 

With PGR, it is possible to amplify a single copy of a specific target sequence in 
20 genomic DNA to a level detectable by several different methodologies (for example, 

hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin- 
enzyme conjugate detection; and/or incorporation of ^^P-labeled deoxyribonucleotide 
triphosphates, such as dGTP or dATP, into the amplified segment). In addition to genomic 
DNA, any oligonucleotide sequence can be amplified with the appropriate set of primer 
25 molecules. In particular, the amplified segments created by the PGR process itself are, 
themselves, efficient templates for subsequent PGR amplifications. Amplified target 
sequences may be used to obtain segments of DNA (for example, genes) for the construction 
of targeting vectors, transgenes, etc. 

As used herein, the term "sample template" refers to a nucleic acid originating 
30 from a sample which is analyzed for the presence of "target". In contrast, "background 

template" is used in reference to nucleic acid other than sample template, which may or may 
not be present in a sample. Background template is most often inadvertent It may be the 
result of carryover, or it may be due to the presence of nucleic acid contaminants sought to 
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be purified away from the sample. For example, nucleic acids other than those to be detected 
may be present as background in a test sample. 

As used herein, the term "primer*' refers to an oligonucleotide, whether occurring 
naturally (for example, as in a purified restriction digest) or produced synthetically, which is 
5 capable of acting as a point of initiation of nucleic acid synthesis when placed under 

conditions in which synthesis of a primer extension product which is complementary to a 
nucleic acid strand is induced (in other words, in the presence of nucleotides, an inducing 
agent such as DNA polymerase, and imder suitable conditions of temperature and pH). The 
primer is preferably single-stranded for maximum efficiency in amplification, but may 

10 alternatively be double-stranded. If double-stranded, the primer is first treated to separate its 
strands before being used to prepare extension products. Preferably, the primer is an 
oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of 
extension products in the presence of the inducing agent. The exact lengths of the primers 
will depend on many factors, including temperature, source of primer and use of the method. 

15 As used herein, the term "probe" refers to an oligonucleotide (in other words, a 

sequence of nucleotides), whether occurring naturally (for example, as in a purified 
restriction digest) or produced synthetically, recombinantly or by PGR amplification, which 
is capable of hybridizing to another oligonucleotide of interest. A probe may be single- 
stranded or double-stranded. Probes are useful in the detection, identification and isolation 

20 of particiilar gene sequences. It is contemplated that the probe used in the present invention 
is labeled with any "reporter molecule," so that it is detectable in a detection system, - 
including, but not limited to enzyme (in other words, ELIS A, as well as enzyme-based 
histochemical assays), fluorescent, radioactive, and luminescent systems. It is not intended 
that the present invention be limited to any particvdar detection system or label. The terms 

25 "reporter molecule" and "label" are used herein interchangeably. In addition to probes, 

primers and deoxynucleoside triphosphates may contain labels; these labels may comprise, 
but are not limited to, ''^P, *^^P, "^^S, enzymes, or fluorescent molecules (e.g., fluorescent 
dyes). 

As used herein, the term "polymerase" refers to any polymerase suitable for use in 
30 the amplification of nucleic acids of interest. It is intended that the term encompass such 
DNA polymerases as Tag DNA polymerase obtained from Themius aquaticus, although 
other polymerases, both thermostable and therinolabile are also encompassed by this 
definition. 
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As used herein, the terms "PGR product" and "amplification product" refer to the 
resultant mixture of compoimds after two or more cycles of the PGR steps of denaturation, 
annealing and extension are complete. These terms encompass the case where there has 
been amplification of one or more segments of one or more target sequences. 
5 As used herein, the term "nested primers" refers to primers that anneal to the target 

sequence in an area that is inside the annealing boundaries used to start PGR. {See, for 
example, MuUis, KB et al (1986) Cold Spring Harbor Symposia, Vol. LI, pp. 263-273). 
Because the nested primers anneal to the target inside the anneaUng boimdaries of the 
starting primers, the predominant PGR-amplified product of the starting primers is 

10 necessarily a longer sequence, than that defined by the annealing boundaries of the nested 
primers. The PGR-amplified product of the nested primers is an amplified segment of the 
target sequence that cannot, therefore, anneal with the starting primers. 
As used hereia, the term "amplification reagents" refers to those reagents 
(deoxyribonucleoside triphosphates, buffer, etc.), needed for ampUfication except for 

15 primers, nucleic acid template and the amplification enzyme. 

The term "sample" as used herein refers to any type of material obtained from 
plants, humans or other animals (for example, any bodily fluid or tissue), cell or tissue 
cultures, cell lines, or any in vitro culture. Indeed, the term "sample" as used herein is used 
in its broadest sense. A biological sample suspected of containing nucleic acid encoding 

20 acyl-GoA thioesterase may comprise a cell, chromosomes isolated from a cell (e.^,, a spread 
of metaphase chromosomes), genomic DNA (in solution or boimd to a sohd support such as 
for Southern blot analysis), RNA (in solution or bound to a solid support such as for 
Northern blot analysis), cDNA (in solution or bound to a solid support) and the Uke. A 
sample suspected of containing a protein may comprise a cell, a portion of a tissue, an extract 

25 containing one or more proteins and the like. 

As used herein, the term "eukaryote" refers to organisms distinguishable from 
"prokaryotes." It is intended that the term encompass all organisms with cells that exhibit 
the usual characteristics of eukaryotes such as the presence of a true nucleus botmded by a 
nuclear membrane, within which lie the chromosomes, the presence of membrane-bound 

30 organelles, and other characteristics commonly observed in eukaiyotic organisms. Thus, the 
term includes, but is not limited to such organisms as plants, fungi, protozoa, and animals 
(for example, humans). 
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As used herein, the temi "antimetabolite" refers to any substance with a close 
structural resemblance to another, essential substance (in other words, metabolite) that is 
required for normal physiologic function. Typically, antimetabolites exert their effects by 
interfering with the utilization of the essential metabolite. 

5 

I. Acyl-CoA Thioesterases 

Acyl-CoA thioesterases (ACHs) catalyze the following general reaction: 

Acyl-CoA + HiO — > free fatty acid and Co A 

10 

wherein acyl-CoA is hydrolyzed to free fatty acid and Co A. These reactions are important in 
Upid metabolism. For example, previous studies have shown that there is an acyl-CoA 
thioesterase that resides in the inner envelope of the chloroplast membrane of peas (Andrews 
and Keegstra, 1983), although the protein sequence was never discovered, nor was the gene 

15 for this thioesterase ever isolated. During the development of the present invention, genes 
were identified that code for putative peroxisomal and mitochondrial acyl-CoA thioesterases. 
Taken together, these enzymes have the capability of regulating acyl-CoA traffic and usage 
by removing, through hydrolysis, specific acyl-CoAs from the lipid metabolism pathways. 
Although an understanding of the mechanism is not necessary in order to make and use the 

20 present invention, it is believed that this acyl-CoA thioesterase play a role in determining the 
acyl-CoA molecules that leave the chloroplast. Chloroplasts typically produce and export 16 
and 1 8 carbon acyl-CoAs. It is contemplated that if an irregular acyl-CoA is produced, the 
acyl-CoA thioesterase cleaves the CoA, preventing the export of this fatty acid by acyl-CoA 
and thus its incorporation into storage lipids. Eliminating the "proofreading activity" is 

25 contemplated as being important in applications involving plants that produce and store 

irregular fatty acids. However, it is not necessary to understand the mechanisms involved in 
this process in order to make and use the present invention. 

The present invention provides four acyl-CoA thioesterase genes, designated 
"ACHl," »ACH2," and ''ACHS:' All of these genes were identified based upon 

30 their amino acid homology to reported acyl-CoA thioesterases; however, the genes of the 
present invention possess unusual amino sequences. Based upon amino acid homology 
analysis, it appears that these four genes are members of two classes of acyl-CoA 
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thioesterases in Arabidopsis,, These two classes are very different, as there is only about 30 
percent identity between them. 

The first class of acyl-CoA thioesterases contains genes encoding ACHl and 
ACH2, which were found to be 68.7 percent identical to each other. These two proteins 
5 appear to be localized in peroxisomes of plants since the carboxy termini end either "AKL" 
or "SKL," both of which are consequence sequences for peroxisomal targeting (see Fig 5, 
panel A). Both proteins also possess a unique amino-terminus sequence which is not 
observed in any other known acyl-CoA thioesterase (see Figure 5, panel A). This amino- 
terminus sequence contains a putative cGMP binding domain (which is boxed and 

10 highlighted in Figure 5, panel A), which has not been identified in any other known acyl- 
CoA thioesterase. The presence of a cGMP binding site suggests that these enzymes are 
likely to be controlled through signaling cascades. It is contemplated that these enzymes are 
involved in lipid oxidation. 

The second class contains genes encoding ACH4 and ACH5; which were found to 

15 be 57.8 percent identical to each other, and 66.7 percent similar to each other, where 

similarity refers to amino acids with similar chemical characteristics, such as hydrophobicity, 
hydrophyllicity, etc. These two proteins appear to be localized to the mitochondria. These 
genes were identified by their amino acid homology to a rat mitochondrial acyl-CoA 
thioesterase gene. Furthermore, ACH4 and ACH5 each possess a unique amino terminus 

20 sequence, which is contemplated to correspond to a mitochondrial transit peptide, as 

mitochondrial targeting sequences are generally not conserved. Moreover, these proteins 
were not imported into chloroplasts, ruling out the possibility that the N-termini correspond 
to plastid targeting signals. Since there is no lipid oxidation that occurs in plant 
mitochondria, it is contemplated that these enzymes play a role in the plant mitochondrial 

25 lipid synthesis pathway that has recently been characterized (Gueguen et al. 2000). 

Experiments to determine the locaUzation of these proteins fused to green fluorescent protein 
indicate that these fiision proteins ultimately end up in the mitochondria. 

Thus, it is contemplated that these genes will find use in the development of plants 
containing specialized fatty acid compositions. Each of these genes is discussed in fiirther 

30 detail below. 
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A. Class 1: Putative Peroxisomal Enzymes 



1. ACHl 

ACHl was identified using mArabidopsis database and shows high homology to 
a hxraian acyl-CoA thioesterase (HTVl Nef-associated acyl-CoA thioesterase). Portions of 
ACHJ cDNA were subcloned using a cDNA library as a template for RT-PCR reactions. 
Initially, the 5' and 3* ends of the gene were not identified due to the weak homology 
between these regions of the gene and other thioesterases. In addition, there appeared to be 
cDNAs of two different sizes. The discovery of an additional predicted open reading firame 
(ORF) positioned very closely to what had been considered the ACHl start codon resulted in 
the identification of the 5' end much fiuther upstream than originally thought. This region 
had gone unobserved as it bore no resemblance to other knovm acyl-CoA thioesterases. This 
extra bit of sequence, which corresponded to two extra exons of coding region, allowed the 
identification of cDNA clones in the database that encoded the entire ACHl cDNA. Two 
clones were identified, one of which was shorter than the other and had an internal nonsense 
codon. The longer clone was designated as the true ACHl cDNA. This cDNA is shown in 
Figure 1 (SEQ ID NO: 1), while the peptide sequence is shovra in Figure 5, Panel A (SEQ ID 
NO:5). Figure 5, Panel A also indicates the putative cGMP binding domain (shown boxed 
and highUghted). This region has not been identified in other known acyl-CoA thioesterases. 

Importantly, the C-terminus of the putative protein was "AKL," a consensus 
sequence for targeting to the peroxisome. The extra portion at the N-terminus bore a weak 
homology to cyclic-GMP (cGMP) binding proteins, indicating that the enzyme is likely to be 
controlled through signaling cascades. Reverse transcriptase-polymerase chain reaction (RT- 
PCR) was used to demonstrate the presence of mRNA in various tissues (according to 
protocols designed by GIBCO). Indeed, ACHl mRNA was found to be expressed in all 
tissues analyzed, including dry seed, root, leaf, rosette leaves, total aerials, and siliques. 
RNA blot analysis showed that ACHl is constitutively expressed at equal levels, in all 
tissues examined, including rosette leaves, leaves, young flowers, old flowers, and sihques. 

Subsequent expression studies, in which the gene was over-expressed in E. coli 
indicated that the gene product could be successfiilly over-expressed, but apparently only in 
iQsoluble form. Moreover, no increased acyl-CoA thioesterase activity was observed in cell 
lysates, which suggests that the produced protein is likely to be improperly folded under the 
over-expression conditions utilized in these experiments. 
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At least one mutant Arabidopsis line carrying a T-DNA insertion in the ACHl 
gene has been identified; mutant seeds are germinated and grown, and the effects of the 
mutation on plant morphological, physiological, and biochemical characteristics are 
evaluated. It is contemplated that, because there appears to be at least one additional acyl- 
5 CoA thioesterase in this class, the mutant plants will not be significantly different firom wild- 
type plants. However, it is anticipated that the mutants can be crossed with a plant carrying a 
T-DNA insertion in the ACH2 gene, thus completely removing the activity of the enzymes of 
this class. This will be useful for further evaluating the function of these genes in vivo. 



10 2. ACH2 

^ ' As with ACHl^ ACH2 was first identified using the human acyl-CoA thioesterase 

as a search tool in the Arabidopsis database. Although the full-length gene was not 
identified in this vv^ay, an expressed sequence tag (EST) was identified that corresponded to 
about 250 base pairs (bp) of iihQACH2 cDNA. Using primers based on this sequence, a 

15 genomic PGR product was generated for use in screening a genomic library for the full- 
length gene. Subsequently, a genomic clone was isolated firom a genomic phage library and 
sequenced. As was observed fox ACHl, RNA blot hybridization indicated ih3itACH2 was 
constitutively expressed in all tissues examined. 

Also as was observed for ACHl, it was recognized that the ACH2 gene had more 

20 in its 5' coding region than was originally expected. Analysis of the putative gene product 

revealed a protein that was 68.7 percent identical to ACHl, and had a consensus sequence for 
targeting to the peroxisome (in other words, SKL at the C-terminus), Full-length cDNA was 
then cloned using the RT-PCR product. As the RT-PCR seemed to be prone to mistakes, it 
took some time and effort to find a clone with the correct sequence. Only RT-PCR products 

25 Using rosette leaf mRNA as a template yielded a product with the correct sequence. The 

cDNA sequence is shovra in Figure 2 (SEQ ID NO:2), while the peptide sequence is shovm 
in Figure 5, Panel B (SEQ ID NO:6). Figure 5, Panel B also indicates the putative cGMP 
binding domain (shown boxed and highlighted). This region has not been identified in other 
known acyl-CoA thioesterases. 

30 In fact, for a period of time, a sequence initially thought to be a third isoform of 

ACHl and ACH2 was identified. However, once the entire gene was sequenced, it was 
determined that the sequence was identical to ACH2. 
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Subsequent expression studies of ACH2, in which the gene was over-expressed as 
a fusion product, indicted that the gene product could be successfully over-expressed. The 
protein was over-expressed as both a fusion product vnfh MBP and as a fusion product with 
a six histidine tag. Both fusion products were soluble, and both possessed acyl-CoA 
5 thioesterase activity, demonstrating that the gene product is an acyl-CoA thioesterase. 

At least one mutant Arabidopsis line carrying a T-DNA insertion has been 
identified; mutant seeds are germinated and grown, and the effects of the mutation on plant 
morphological, physiological, and biochemical characteristics are evaluated. It is 
contemplated that, because there appears to be at least one additional acyl-CoA thioesterase 
10 in this class, the mutant plants will not be significantly different from wild-type plants. 
However, it is anticipated that the mutants can be crossed with a plant carrjdng a T-DNA 
insertion in the ACHl gene, thus completely removing the activity of the enzymes of this 
class. This will be usefiil for further evaluating the function of these genes in vivo, 

15 B. Class 2: Putative Mitochondrial Enzymes 

1. ACH4 

This gene was identified based upon its homology to a rat mitochondrial acyl-CoA 
thioesterase. The full-length gene was in the Arabidopsis database, but it was not originally 
possible to identify the exact positions of the 3* and 5* ends of the gene. Attempts to isolate 

20 the ACH4 cDNA from a lambda-PRL library and screens for a T-DNA mutant were 

unsuccessful. In view of these failures, a cDNA sequence obtained from Genome Systems, 
Inc. was sequenced. Sequencing provided the putative 5' end of the gene. The putative 
protein product is homologous to other known acyl-CoA thioesterases, with the exception of 
the N-termtnus. Thus, it is contemplated that this sequence corresponds to a transit peptide 

25 which is most likely mitochondrial. Mitochondrial target sequences are generally not 

conserved, and thus are variable. The cDNA sequence is shown in Figure 3 (SEQ ID NO:3), 
while the peptide sequence is shown in Figure 6, Panel A (SEQ ID NO:7). 

The localization of ACH4 is determined from several approaches. In vitro 
experiments indicated that ACH4 was not imported into chloroplasts. Based on its 

30 homology to mitochondrial homologues in other organisms, it is very likely localized to the 
mitochondria. Thus, in vivo experiments to determine the sub-cellular location of ACH4 
with an ^CfiW-green fluorescent protein (GFP) construct indicate that ACH4 is ultimately 
localized to the mitochondria. 
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2. ACH5 

Th&ACH5 genomic sequence was also identified using the Arabidopsis database. 
Interestingly, although a portion of the cDNA was cloned during the development of the 
present invention, initial work with this gene had failed to identify the 5' region. However, 
5 Genefinder was used to predict a protein product from the ACH5 gene that appeared to be 
correct. RT-PCR was subsequently used to ampUfy the putative full-length cDNA. 
Sequencing was then conducted on the full-length cDNA, which codes for a protein that is 
nearly identical to the one initially predicted by Genefinder, with the exception of a small 
region positioned towards the C-terminus. The cDNA sequence is shown in Figure 4 (SEQ 

10 ID NO:4), while the peptide sequence is shown in Figure 6, Panel B (SEQ ID NO:8). 

The activity of a fusion protein, ACH5-6His, which was over-expressed in E. coli, 
was assayed in the soluble extract of lysed cells. The extract from cells expressing the fusion 
protein displayed au acyl-CoA thioesterase activity about 5 times greater than that observed 
in control cells, indicatiag that the gene product is an acyl-thioesterase. 

15 Localization studies in vitro indicated that, as with ACH4, ACH5 was not 

imported into chloroplasts, indicating that its unique N-terrainus played some other role than 
a plastid targeting signal. Due to its homology with other mitochondrial acyl-CoA 
thioesterases from other organisms, it is believed that the N-terminus serves as a 
mitochondrial targeting signal. Thus, in vivo experiments with an ACH5-green fluorescent 

20 protein (GFP) fusion protein construct indicate that the fusion protein is ultimately localized 
to the mitochondria. Thus, these results indicate that ACH5 is localized in the mitochondria, 
as is ACH4. 

Although a T-DNA mutant plant was identified with a T-DNA insertion in the 
ACH5 gene, there was no visible phenotype associated with this mutant. Experiments to 
25 further characterize the function of these enzymes include the transformation of this T-DNA 
mutant with antisense constructs for ACH4. It is anticipated that such constructs will have 
varying and detrimental effects on the transformed plants. 

C. Genetic Family Tree 

The unique feature of ACHl and ACH2 that sets them apart from other acyl-CoA 
30 thioesterases previously reported is al20 amino acid extension at the N-terminus, as 

described above. Although all forms of life have enzymes that are homologous to ACHl and 
ACH2, it appears that only plants have enzymes with this unique N-terminal extension. This 
. was determined by searching public databases and looking for sequences similar to those of 
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ACHl and ACH2. Gene sequences coding for en2ynies similar to ACHl and ACH2, with 
the heretofore-undescribed N-terminal region, can be found in 

wheat, tomato, soybean, maize, rice, medicago, barley, and potato. The BLAST server at 
NCBI failed to find any of these other plant sequences, but they can be found by performing 
5 BLAST searches with the TIGR Gene Indices. As noted above, the N-tenninal extension 
bears some homology to cyclic-nucleotide binding domains that bind cyclic- AMP (cAMP) 
or cycUc-GMP (cGMP). 

Predicted amino acid sequences of these other ACH proteins were compared, and 
used as the basis to construct a phylogenetic tree (G. Tilton, manuscript in preparation). 

10 

D. Summary 

Thus, the present invention provides nucleic acids encoding plant ACHs (for 
example, SEQ ID NOs: 1-4). Other embodiments of the present invention provide nucleic 
acid sequences that are capable of hybridizing to SEQ ID NOs: 1-4 xmder conditions of high 

15 to low stringency. In some embodiments, the hybridizing nucleic acid sequence encodes a 
protein that retains at least one biological activity of the naturally occurring ACH fi-om 
which it is derived. In preferred embodiments, hybridization conditions are based on the 
melting temperature (Tm) of the nucleic acid binding complex and confer a defined 
"stringency" as explained above. 

20 In other embodiments of the present invention, variants of the disclosed ACHs are 

provided. In preferred embodiments, variants result from mutation, (in other words, a . 
change in the nucleic acid sequence) and generally produce altered niRNAs or polypeptides 
whose structure or function may or may not be altered. Any given gene may have none, one, 
or many variant forms. Common mutational changes that give rise to variants are generally 

25 ascribed to deletions, additions or substitutions of nucleic acids. Each of these types of 
changes may occur alone, or in combination with the others, and at the rate of one or more 
times in a given sequence. 

It is contemplated that is possible to modify the structure of a peptide having an 
activity (for example, ACH activity) for such purposes as increasing synthetic activity or 

30 altering the affinity of the ACH for a particular substrate. Such modified peptides are 
considered functional equivalents of peptides having an activity of an ACH as defined 
herein. A modified peptide can be produced in which the nucleotide sequence encoding the 
polypeptide has been altered, such as by substitution, deletion, or addition. In some 
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preferred embodiments of the present invention, the alteration increases synthetic activity or 
alters the affinity of the ACH for a particular substrate. In particularly preferred 
embodiments, these modifications do not significantly reduce the synthetic activity of the 
modified enzyme. In other words, construct "X" can be evaluated according to the following 
5 protocol in order to determine whether it is a member of the genus of modified or variant 
ACHs of the present invention as defined functionally, rather than structurally. In preferred 
embodiments, the activity of variant ACHs is evaluated using the ACH activity assay 
described herein at Example 4. 

Moreover, as described above, variant forms of ACHs are also contemplated as 

10 being equivalent to those peptides and DNA molecules that are set forth in more detail 
Herein. For example, it is contemplated that isolated replacement of a leucine with an 
isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar 
replacement of an amino acid with a structurally related amino acid (in other words, 
conservative mutations) will not have a major effect on the biological activity of the resulting 

15 molecule. Accordingly, some embodiments of the present invention provide variants of 
ACHs disclosed herein containing conservative replacements. Conservative replacements 
are those that take place within a family of anaino acids that are related in their side chains. 
Genetically encoded amino acids can be divided into four families: (1) acidic (aspartate, 
glutamate); (2) basic (lysine, arginine, histidine); (3) nonpolar (alanine, valine, leucine, 

20 isoleucine, proline, phenylalanine, methionine, tryptophan); and (4) imcharged polar 
(glycine, asparagine, glutamine, cysteine, serine, threonine, tyrosine). Phenylalanine, 
tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. In similar 
fashion, the amino acid repertoire can be grouped as (1) acidic (aspartate, glutamate); (2) 
basic (lysine, arginine, histidine), (3) aliphatic (glycine, alanine, valine, leucine, isoleucine, 

25 - serine, threonine), with serine and threonine optionally be grouped separately as aUphatic- 
hydroxyl; (4) aromatic (phenylalanine, tyrosine, tryptophan); (5) amide (asparagine, 
glutamine); and (6) sulfur -containing (cysteine and methionine) (e.g-., Stryer ed.. 
Biochemistry^ pg. 17-21, 2nd ed, WH Freeman and Co., 1981), Whether a change in the 
amino acid sequence of a peptide results in a functional homolog can be readily determined 

30 by assessing the ability of the variant peptide to function in a fashion similar to the wild-type 
protein. Peptides having more than one replacement can readily be tested in the same 
manner. 
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More rarely, a variant includes "nonconservative" changes (for example, 
replacement of a glycine with a tryptophan). Analogous minor variations can also include 
amino acid deletions or insertions, or both. Guidance in determining which amino acid 
residues can be substituted, inserted, or deleted without abolishing biological activity can be 
5 foxmd using computer programs (for example, LASERGENE software, DNASTAR Inc., 
Madison, Wis.). 

As described in more detail below, variants may be produced by methods such as 
directed evolution or other techniques for producing combinatorial libraries of variants, 
described in more detail below. In still other embodiments of the present invention, the 

1 0 nucleotide sequences of the present invention may be engineered in order to alter an ACH 

coding sequence including, but not limited to, alterations that modify the cloning, processing, 
localization, secretion, and/or expression of the gene product. For example, mutations may 
be introduced usiug techniques that are well known in the art ie.g.^ site-directed mutagenesis 
to insert new restriction sites, alter glycosylation patterns, or change codon preference, etc.). 

15 In some embodiments, the present invention provides ACH polypeptides (for 

example, SEQ ID NOs: 5-8). In other embodiments, the present invention provides imique 
cGMP-binding domains associated with ACHl and ACH2 (in other words, SEQ ID NOS:l 1 
and 12). Still further embodiments of the present invention provide fragments, fusion 
proteins or functional equivalents of ACHs. In still other embodiments of the present 

20 invention, nucleic acid sequences corresponding to a selected ACH may be used to generate 
recombinant DNA molecules that direct the expression of an ACH and variants in 
appropriate host cells. In some embodiments of the present invention, the polypeptide may 
be a naturally purified product, while in other embodiments it may be a product of chemical 
synthetic procedures, and in still other embodiments it may be produced by recombinant 

25 techniques using a prokaryotic or eukaryotic host celL(for example, by bacterial cells in 

culture). In other embodiments, the polypeptides of the invention may also include an initial 
methionine amino acid residue. 

In one embodiment of the present invention, due to the inherent degeneracy of the 
genetic code, DNA sequences other than SEQ ID NOs: 1-4 or 1 1-12, encoding substantially 

30 the same or a functionally equivalent amino acid sequence, may be used to clone and express 
an ACH. In general, such nucleic acid sequences hybridize to SEQ ID NOs: 1-4, as well as 
SEQ ID NOS:ll-12, vmder conditions of high to low stringency as described above. As will 
be xmderstood by those of skill in the art, it may be advantageous to produce ACH-encoding 
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nucleotide sequences possessing non-naturally occurring codons. Therefore, in some 
preferred embodiments, codons preferred by a particular prokaryotic or eukaryotic host are 
selected, for example, to increase the rate of ACH expression or to produce recombinant 
RNA transcripts having desirable properties, such as increased synthetic activity or altered 
5 affinity of the ACH for a particular substrate. 

n. Uses of ACH Polynucleotides and Polypeptides 

1, Vectors for Expression of ACHs 

10 In some embodiments of the present invention, the ACH nucleic acids are used to 

construct vectors for the expression of ACH polypeptides. Accordingly, the nucleic acids of 
the present invention may be employed for producing polypeptides by recombinant 
techniques. Thus, for example, the nucleic acid may be included in any one of a variety of 
expression vectors for expressing a polypeptide. 

15 In some embodiments of the present invention, vectors are provided for the 

transfection of plant hosts to create transgenic plants. In general, these vectors comprise an 
ACH nucleic acid (for example, SEQ ID NOs: 1-4) operably linked to a promoter and other 
regulatory sequences (for example, enhancers, polyadenylation signals, etc.) required for 
expression in a plant. The ACH nucleic acid can be oriented to produce sense or antisense 

20 transcripts, depending on the desired use. In some embodiments, the promoter is a 

constitutive promoter (for example, superpromoter or 35S promoter). In other embodiments, 
the promoter is a seed specific promoter (for example, phaseolin promoter [See for example, 
U.S. Pat. No. 5,589,616], napin promoter [See for example, U.S. Pat. No. 5,608,152], or 
acyl-CoA carrier protein promoter [See for example, 5,767,363]). 

25 ''"^"^ In some preferred embodiments, the vector is adapted for use in an Agrobacterium 

mediated transfection process (See for example, U.S. Pat. Nos.5,98 1,839; 6,051,757; 
5,981,840; 5,824,877; and 4,940,838). Construction of recombinant Ti and Ri plasmids in 
general follows methods typically used with the more common bacterial vectors, such as 
pBR322. Additional use can be made of accessory genetic elements sometimes found with 

30 the native plasmids and sometimes constructed from foreign sequences. These may include 
but are not limited to structural genes for antibiotic resistance as selection genes. 

There are two systems of recombinant Ti and Ri plasmid vector systems now in 
use. The first system is caUed the "cointegrate" system. In this system, the shuttle vector 
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containing the gene of interest is inserted by genetic recombination into a non-oncogenic Ti 
plasmid that contains both the cis-acting and trans-acting elements required for plant 
transformation as, for example, in the pMLJl shuttle vector and the non-oncogenic Ti 
plasmid pGV3850. The second system is called the "binary" system in which two plasmids 
5 are used; the gene of interest is inserted into a shuttle vector containing the cis-acting 

elements required for plant transformation. The other necessary functions are provided in 
trans by the non-oncogenic Ti plasmid as exemplified by the pBIN19 shuttle vector and the 
non-oncogenic Ti plasmid PAL4404. Some of these vectors are commercially available. 

It may be desirable to target the nucleic acid sequence of interest to a particular 

10 locus on the plant genome. Site-directed integration of the nucleic acid sequence of interest 
into the plant cell genome may be achieved by, for example, homologous recombination 
using Agrobacterium-denved sequences. Generally, plant cells are incubated with a strain of 
Agrobacterium which contains a targeting vector in which sequences that are homologous to 
a DNA sequence inside the target locus are flanked by Agrobacterium transfer-DNA (T- 

15 DNA) sequences, as previously described (U.S. Pat. No. 5,501,967). One of skill in the art 
knows that homologous recombination may be achieved using targeting vectors which 
contain sequences that are homologous to any part of the targeted plant gene, whether . 
belonging to the regulatory elements of the gene, or the coding regions of the gene. 
Homologous recombination may be achieved at any region of a plant gene so long as the 

20 nucleic acid sequence of regions flanking the site to be targeted is known. 

The nucleic acids of the present invention may also be utilized to construct vectors 
derived from plant (+) RNA virases (for example, brome mosaic virus, tobacco mosaic virus, 
alfalfa mosaic virus, cucumber mosaic virus, tomato mosaic virus, and combinations and 
hybrids thereof). Generally, the inserted ACH polynucleotide can be expressed from these 

25 vectors as a ftision protein (for example, coat protein fiision protein) or from its own 
subgenomic promoter or other promoter. Methods for the construction and use of such 
viruses are described in U.S. Pat. Nos. 5,846,795; 5,500,360; 5,173,410; 5,965,794; 
5,977,438; and 5,866,785. 

Altematively, vectors can be constructed for expression in hosts other plants (for 

30 example, prokaxyotic cells such as E. colU yeast cells, C. elegans^ and mammalian cell . 
culture cells). In some embodiments of the present invention, vectors include, but are not 
limited to, chromosomal, nonchromosomal and synthetic DNA sequences (for example, 
derivatives of SV40, bacterial plasmids, phage DNA; baculovirus, yeast plasmids, vectors 
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derived from combinations of plasmids and phage DNA, and viral DNA such as vaccinia, 
adenovirus, fowl pox virus, and pseudorabies). Large nxunbers of suitable vectors that are 
replicable and viable in the host are known to those of skill in the art, and are commercially 
available. Any other plasmid or vector may be used as long as they are replicable and viable 
5 in the host. 

In some preferred embodiments of the present invention, bacterial expression 
vectors comprise an origin of replication, a suitable promoter and optionally an enhancer, 
and also any necessary ribosome binding sites, polyadenylation sites, transcriptional 
termination sequences, and 5' flanking nontranscribed sequences. Promoters useful in the 

10 present invention include, but are not limited to, retroviral LTRs, SV40 promoter, CMV 
pi'omoter, RS V promoter, E, coli lac or trp promoters, phage lambda Pl and Pr promoters, 
T3, SP6 and T7 promoters. In other embodiments of the present invention, recombinant 
expression vectors include origins of replication and selectable markers, {e.g.^ tetracycUne or 
ampicillin resistance in E. coli^ or neomycin phosphotransferase gene for selection in 

15 eukaryotic cells). 



2. Expression of ACHs in Transgenic Plants 

Vectors described above can be utilized to express the ACHs of the present 
invention in transgenic plants. A variety of methods are known for producing transgeinc 
20 plants. 

In some embodiments, Agrobacterium mediated transfection is utilized to create 
transgenic plants. Since mpst dicotyledonous plant are natural hosts for Agrobacterium, 
almost every dicotyledonous plant may be transformed hy Agrobacterium in vitro: Although 
monocotyledonous plants, and in particular, cereals and grasses, are not natural hosts to 

25 Agrobacterium^ w^ork to transform them using Agrobacterium has also been carried out 
(Hooykas- Van Slogteren et al , (1984) Nature 311 :763-764). Plant genera that may be 
transformed by Agrobacterium include Arabidopsis^ Chrysanthemum^ Dianthus, Gerbera, 
Euphorbia, Pelaronium, Ipomoea, Passiflora, Cyclamen^ Malus^ Prunus, Rosa, Rubus, 
PopuluSy Santalum^ Allium, Lilium, Narcissus^ Ananas, Arachis, Phaseolus and Pisum. 

3 0 For transformation with Agrobacterium, disarmed Agrobacterium cells are 

transformed with recombinant Ti plasmids of Agrobacterium tumefaciens or Ri plasmids of 
Agrobacterium rhizogenes (such as those described in U.S. Patent No. 4,940,838). The 
nucleic acid sequence of interest is then stably integrated into the plant genome by infection 
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with the transformed Agrobacterium strain. For example, heterologous nucleic acid 
sequences have been introduced into plant tissues using the natural DNA transfer system of 
Agrobacterium tumefaciens znd Agrobacterium rhizogenes bacteria (for review, see Klee et 
al, (1987) Ann. Rev. Plant Phys. 38:467-486). 

There are three common methods to transform plant cells with Agrobacterium : 
The first method is by co-cultivation of Agrobacterium with cultured isolated protoplasts. 
This method requires an established culture system that allows culturing protoplasts and 
plant regeneration firom cultured protoplasts. The second method is by transformation of 
cells or tissues vn\h Agrobacterium. This method requires (a) that the plant cells or tissues 
can be transformed hy Agrobacterium and (b) that the transformed cells or tissues can be 
induced to regenerate into whole plants. The third method is by transformation of seeds, 
apices or meristems with Agrobacterium. This method requires micropropagation. 

One of skill in the art knows that the efficiency of transformation by 
Agrobacterium may be enhanced by using a number of methods known in the art. For 
example, the inclusion of a natural wound response molecule such as acetosyringone (AS) to 
the Agrobacterium culture has been shown to enhance transformation efficiency with 
Agrobacterium tumefaciens (Shahla et al,, (1987) Plant Molec. Biol. 8:291-298). 
Alternatively, transformation efficiency may be enhanced by wounding the target tissue to be 
transformed. Woxmding of plant tissue may be achieved, for example, by punching, 
maceration, bombardment with microprojectiles, etc. (See e.g., Sidney et al, (1992) Plant 
Molec. Biol. 18:301-313). 

In still fiurther embodiments, the plant cells are transfected with vectors via 
particle bombardment (in other words, with a gene gun). Particle mediated gene transfer 
methods are known in the art, are commercially available, and include, but are not limited to, 
the gas driven gene delivery instrument descried in McCabe, U.S. Pat. No. 5,584,807. This 
method involves coating the nucleic acid sequence of interest onto heavy metal particles, and 
accelerating the coated particles under the pressure of compressed gas for delivery to the 
target tissue. 

Other particle bombardment methods are also available for the introduction of 
heterologous nucleic acid sequences into plant cells. Generally, these methods involve 
depositing the nucleic acid sequence of interest upon the surface of small, dense particles of 
a material such as gold, platinum, or tungsten. The coated particles are themselves then 
coated onto either a rigid surface, such as a metal plate, or onto a carrier sheet made of a 
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fragile material such as mylar. The coated sheet is then accelerated toward the target 
biological tissue. The use of the flat sheet generates a uniform spread of accelerated particles 
which maximizes the number of cells receiving particles under uniform conditions, resulting 
in the introduction of the nucleic acid sample into the target tissue. 
5 Plants, plant cells and tissues transformed with a heterologous nucleic acid 

sequence of interest are readily detected using methods known in the art including, but not 
limited to, restriction mapping of the genomic DNA, PCR-analysis, DNA-DNA 
hybridization, DNA-RNA hybridization, DNA sequence analysis and the like. 

Additionally, selection of transformed plant cells may be accomplished using a 
10 selection marker gene. It is preferred, though not necessary, that a selection marker gene be 
used to select transformed plant cells. A selection marker gene may confer positive or 
negative selection. 

A positive selection marker gene may be used in constructs for random integration 
and site-directed integration. Positive selection marker genes include antibiotic resistance 

15 genes, and herbicide resistance genes and the like. In one embodiment, the positive selection 
marker gene is the NPTII gene which confers resistance to geneticin (G418) or kanamycin. 
In another embodiment the positive selection marker gene is the HPT gene which confers 
resistance to hygromycin. The choice of the positive selection marker gene is not critical to 
the invention as long as it encodes a functional polypeptide product. Positive selection genes 

20 knovra in the art include, but are not limited to, the ALS gene (chlorsulphuron resistance), 
and the DHFR-gene (methothrexate resistance). 

A negative selection marker gene may also be included in the constructs. The use 
of one or more negative selection marker genes in combination with a positive selection 
marker gene is preferred in constructs used for homologous recombination. Negative 

25 reelection marker genes are generally placed outside the regions mvolved in the homologous 
recombination event. The negative selection marker gene serves to provide a disadvantage 
(preferably lethaUty) to cells that have integrated these genes into their genome in an 
expressible manner. Cells in which the targeting vectors for homologous recombination are 
randomly integrated in the genome will be harmed or killed due to the presence of the 

30 negative selection marker gene. Where a positive selection marker gene is included in the 
construct, only those cells having the positive selection marker gene integrated in their 
genome will survive. 
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The choice of the negative selection marker gene is not critical to the invention as 
long as it encodes a functional polypeptide in the transformed plant cell. The negative 
selection gene may for instance he chosen from the aux-2 gene from the Ti-plasmid of 
Agrobacterium, the r^-gene from SV40, cytochrome P450 from Streptomyces griseolus, the 
Adh'gcne from Maize or Arabidopsis, etc. Any gene encoding an enzyme capable of 
converting a substance which is otherwise harmless to plant cells into a substance which is 
harmful to plant cells may be used. 

It is contemplated that the ACH polynucleotides of the present invention may be 
utilized to either increase or decrease the level of ACH mRNA and/or protein in transfected 
cells as compared to the levels in wild-type cells. Accordingly, in some embodiments, 
expression in plants by the methods described above leads to the overexpression of ACH in 
transgenic plants, plant tissues, or plant cells. The present invention is not limited to any 
particular mechanism, hideed, an understanding of a mechanism is not required to practice 
the present invention. However, it is contemplated that overexpression of the ACH 
polynucleotides of the present invention will overcome limitations in the accumulation of 
fatty acids in oilseeds. 

In other embodiments of the present invention, the ACH polynucleotides are 
utilized to decrease the level of ACH protein or mRNA in transgenic plants, plant tissues, or 
plant cells as compared to wild-type plants, plant tissues, or plant cells. One method of 
reducing ACH expression utilizes expression of antisense transcripts. Antisense RNA has 
been used to inhibit plant target genes in a tissue-specific manner (for example, van der Krol 
et aL (1988) Biotechniques 6:958-976). Antisense inhibition has been shown using the entire 
cDNA sequence as well as a partial cDNA sequence (for example, Sheehy et aL (1988) Proc. 
Natl. Acad. Sci. USA 85:8805-8809; Cannon et aL, (1990) Plant Mol. Biol. 15:39-47). 
There is also evidence that 3' non-coding sequence fragment and 5* coding sequence 
fragments, containing as few as 41 base-pairs of a 1.87 kb cDNA, can play important roles m 
antisense inhibition (Ch'ng et aL, (1989) Proc. Natl. Acad, Sci. USA 86:10006-10010). 

Another method of reducing ACH expression utilizes the phenomenon of 
cosuppression or gene silencing {See for example, U.S. Pat. No. 6,063,947). The 
phenomenon of cosuppression has also been used to inhibit plant target genes in a tissue- 
specific manner. Cosuppression of an endogenous gene using a full-length 
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cDNA sequence as well as a partial cDNA sequence (730 bp of a 1770 bp cDNA) are known 
(for example, Napoli et al (1990) Plant Cell 2:279-289; van der Krol et al (1990) Plant Cell 
2:291-299; Smith et al (1990) Mol. Gen. Genetics 224:477-481). 



3. Other Host Cells and Systems for Production of ACHs 

The present invention also contemplates that the vectors described above can be 
utilized to express plant ACH genes and variants in prokaryotic and eiikaryotic cells. In 
some embodiments of the present invention, the host cell can be a prokaryotic cell (for 
example, a bacterial cell). Specific examples of host cells include, but are not limited to, E. 
colU Salmonella typhimurium^ Bacillus subtilis, and various species within the genera 
Pseudomonas, Streptomyces^ and Staphylococcus, The constructs in host cells can be used in 
a conventional manner to produce the gene product encoded by the recombinant sequence. 
In some embodiments, introduction of the construct into the host cell can be accomplished 
by any suitable method known in the art (e.g., calcium phosphate transfection, DEAE- 
Dextran mediated transfection, or electroporation (for example, Davis et al. (19896) asic 
Methods in Molecular Biology). Alternatively, in some embodiments of the present 
invention, the polypeptides of the invention can be synthetically produced by conventional 
peptide synthesizers. 

In some embodiments of the present invention, following transformation of a 
suitable host strain and growth of the host strain to an appropriate cell density, the selected 
promoter is induced by appropriate means (e.g-., temperature shift or chemical induction), 
and the host cells are cultured for an additional period. In other embodiments of the present 
invention, the host cells are harvested (for example, by centrifugation), disrupted by physical 
or chemical means, and the resulting cmde extract retained for further purification. In still 
other embodiments of the present invention, microbial cells employed in expression of 
proteins can be disrupted by any convenient method, including freeze-thaw cycling, 
sonication, mechanical disruption, or use of cell lysing agents. 

It is not necessary that a host organism be used for the expression of the nucleic 
acid constructs of the invention. For example, expression of the protein encoded by a 
nucleic acid construct may be achieved through the use of a cell-free in vitro 
transcription/translation system. An example of such a cell-free system is the commercially 
available TnT™ Coupled Reticulocyte Lysate System (Promega; this cell-free system is 
described m U.S. Patent No. 5,324,637). 
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4, Purification of ACHs 

The present invention also provides methods for recovering and purifying ACHs 
from native and recombinant cell cultures including, but not limited to, ammonium sulfate 
5 precipitation, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography and lectin chromatography. In other embodiments of the present invention, 
protein refolding steps can be used as necessary, in completing configuration of the mature 
protein. In still other embodiments of the present invention, high performance Uquid 

10 chromatogr^hy (HPLC) can be employed as one or more purification st^s. 

In other embodiments of the present invention, the nucleic acid construct 
containing DNA encoding the wild-type or a variant ACH fiuHier comprises the addition of 
exogenous sequences (in other words, sequences not encoded by the ACH coding region) to 
either the 5' or 3' end of the ACH coding region to allow for ease in purification of the 

15 resulting polymerase protein (the resulting protein containing such an affinity tag is termed a 
"fiision protein"). Several commercially available expression vectors are available for 
attaching affinity tags (for example, an exogenous sequence) to either the amino or carboxy- 
termini of a coding region. In general these affinity tags are short stretches of amino acids 
that do not alter the characteristics of the protein to be expressed (in other words, no change 

20 to enzymatic activities results). 

For example, the pET expression system (Novagen) utilizes a vector containing 
the T7 promoter operably linked to a fusion protein with a short stretch of histidine residues 
at either end of the protein and a host cell that can be induced to express the T7 DNA 
polymerase (in other words, a DE3 host strain). The production of fusion proteins containing 

25 a histidine tract is not limited to the use of a particular expression vector and host strain. 
Several commercially available expression vectors and host strains can be used to express 
protein sequences as a fusion protein containing a histidine tract (for example, the pQE series 
[pQE-8, 12, 16, 17, 18, 30, 31, 32, 40, 41, 42, 50, 51, 52, 60 and 70] of expression vectors 
(Qiagen) used with host strains M15[pREP4] [Qiagen] and SG13009[pREP4] [Qiagen]) can 

30 be used to express fusion proteins containing six histidine residues at the amino-terminus of 
the fusion protein). Additional expression systems which utilize other affinity tags are 
known to the art. 



-40- 



wo 02/08433 PCT/USOl/22907 

Once a suitable nucleic acid construct has been made, the ACH may be produced 
from the construct. The examples below and standard molecular biological teachings known 
in the art enable one to manipulate the construct by a variety of suitable methods. 

5 5. Deletion Mutants of ACHs 

The present invention further provides fragments of ACHs. In some embodiments 
of the present invention, when expression of a portion of an ACH is desired, it may be 
necessary to add a start codon (ATG) to the oligonucleotide fragment containing the desired 
sequence to be expressed. It is well known in the art that a methionine at the N-terminal 

10 position can be enzymatically cleaved by the use of the enzyme methionine aminopeptidase 
(MAP). MAP has been cloned from E. colt (Ben-Bassat et al, J. Bacteriol. 169:751-757, 
1987) and S, typhimurium, and its in vitro activity has been demonstrated on recombinant 
proteins (Miller et al, PNAS 84:2718-1722, 1990). Therefore, removal of anN-terrainal 
methionine, if desired, can be achieved either in vivo by expressing such recombinant 

15 polypeptides in a host producing MAP (for example, E. coU or CM89 or S, cerevisiae), or in 
vitro by use of purified MAP. It is contemplated that ACH deletion mutants will be screened 
for activity as described above. 

6. Use of ACH Nucleic Acids in Directed Evolution 

20 It is contemplated that the ACH nucleic acids (for example, SEQ ID NOs: 1-4, and 

11-12) can be utilized as starting nucleic acids for directed evolution. These techniques can 
be utilized to develop ACH variants having desirable properties such as increased synthetic 
activity or altered affinity for a particular fatty acid substrate. 

In some embodiments, artificial evolution is performed by random mutagenesis 

25 (for example, by utilizing error-prone PCR to introduce random mutations into a given 

coding sequence). The critical feature of this method is that the frequency of mutation must 
be finely tuned. As a general rale, beneficial mutations are rare, while deleterious mutations 
are common. This is because the combination of a deleterious mutation and a beneficial 
mutation often results in an inactive enzyme. The ideal number of base substitutions for 

30 targeted gene is usually between 1.5 and 5 (Moore and Amold (1996) Nat. Biotech.: 14, 458- 
67;Leungera/. (1989) Technique, 1:11-15; Eckert and Kunkel (1991) PCR Methods AppL, 
1:17-24; Caldwell and Joyce (1992) PCR Methods AppL, 2:28-33; and Zhao and Amold 
(1997) Nuc. Acids. Res., 25:1307-08). After mutagenesis, the resulting clones are selected 
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for desirable activity (for example, screened for ACH activity as described above). 
Successive rounds of mutagenesis and selection are often necessary to develop enzjmies with 
desirable properties. It should be noted that only the useful mutations are carried over to the 
next round of mutagenesis. 
5 In other embodiments of the present invention, the polynucleotides of the present 

invention are used in gene shuffling or sexual PGR procedures (for example, Smith (1994) 
Nature, 370:324-25; U.S. Pat. Nos. 5,837,458; 5,830,721; 5,811,238; 5,733,731). Gene 
shuffling involves random fragmentation of several mutant DNAs followed by their 
reassembly by PGR into full length molecules. Examples of various gene shuffling 

10 procedures include, but are not limited to, assembly following DNAse treatment, the 

staggered extension process (STEP), and random priming in vitro recombkiation. In the 
DNAse mediated method, DNA segments isolated from a pool of positive mutants are 
cleaved into random fragments with DNAsel and subjected to multiple rounds of PGR with 
no added primer. The lengths of random fragments approach that of the uncleaved segment 

15 as the PGR cycles proceed, resulting in mutations in present in different clones becoming 

mixed and accumulating in some of the resulting sequences. Multiple cycles of selection and 
shuffling have led to the functional enhancement of several enzymes (Stemmer (1994) 
Nature, 370:398-91; Stemmer (1994)Proc. Natl. Acad. Sci. USA, 91, 10747-51; Crameri et 
al (1996) Nat. Biotech., 14:315-19; Zhang et al (1997) Proc. Natl. Acad. Sci. USA, 

20 94:4504-09; and Grameri et al (1997) Nat. Biotech., 15:436-38). 

In some embodiments of the combinatorial mutagenesis approach of the present 
invention, the amino acid sequences for a population of ACH homologs or other related 
proteins are aUgned, preferably to promote the highest homology possible. Such a 
population of variants can include, for example, AGH homologs from one or more species, 

25 or AGH homologs from the same species but which differ due to mutation, Amino acids 

appearing at each position of the aligned sequences are selected to create a degenerate set of 
combinatorial sequences. 

In a preferred embodiment of the present invention, the combinatorial AGH 
Ubrary is produced by way of a degenerate library of genes encoding a library of 

30 polypeptides including at least a portion of potential AGH-protein sequences. For example, 
a mixture of synthetic oUgonucleotides are en2ymatically ligated into gene sequences such 
that the degenerate set of potential AGH sequences are expressible as individual 
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polypeptides, or alternatively, as a set of larger fusion proteins (for example, for phage 
display) containing the set of ACH sequences therein. 

There are many ways in which the library of potential ACH homologs can be 
generated from a degenerate oligonucleotide sequence, hi some embodiments, chenucal 
5 synthesis of a degenerate gene sequence is carried out in an automatic DNA synthesizer, and 
the synthetic genes are ligated into an appropriate gene for expression. The pxirpose of a 
degenerate set of genes is to provide, in one mixture, all of the sequences encoding the 
desired set of potential ACH sequences. The synthesis of degenerate oUgonucleotides is well 
known in the art (for example, Narang, Tetrahedron 39:39, 1983; Itakura et aLy Recombinant 

10 DNA, Proc 3rd Cleveland Sympos. Macromol., Walton, ed., Elsevier, Amsterdam, pp 273- 
289, 1981; Itakura et al, Annu. Rev. Biochem. 53:323, 1984; Itakura et al. Science 
198: 1056, 1984; and Ike et al. Nucleic Acid Res. 1 1 :477, 1983), Such techniques have been 
employed in the directed evolution of other proteins (for example, Scott et aL, Science 
249:386-390, 1980; Roberts et aL, PNAS 89:2429-2433, 1992; Devlin et aL, Science 249: 

15 404-406, 1990; Cwirla et aL, PNAS 87: 6378-6382, 1990; as well as U.S. Pat. Nos. 
5,223,409, 5,198,346, and 5,096,815). 

A wide range of techniques are known in the art for screening gene products of 
combinatorial libraries generated by point mutations, and for screening cDNA libraries for 
gene products having a particular property of interest. Such techniques are generally 

20 adaptable for rapid screening of gene Ubraries generated by the combinatorial mutagenesis of 
ACH homologs. The most widely used techniques for screening large gene libraries 
typically comprise cloning the gene library into replicable expression vectors, transforming 
appropriate cells with the resulting library of vectors, and expressing the combinatorial genes 
under conditions such that detection of a desired activity facilitates relatively easy isolation 

25 olf the vector encoding the gene whose product was detected. The illustrative assays 

described below are amenable to high through-put analysis as necessary to screen large 
nimibers of degenerate sequences created by combinatorial mutagenesis techniques. 

In some embodiments of the present invention, the gene library is expressed as a 
fusion protein on the surface of a viral particle. For example, foreign peptide sequences can 

30 ITe expressed on the surface of mfectious phage in the filamentous phage system, thereby 
conferring two significant benefits. First, since these phage can be applied to affinity 
matrices at very high concentrations, a large number of phage can be screened at one time. 
Second, since each infectious phage displays the combinatorial gene product on its surface, if 
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a particular phage is recovered from an affinity matrix in low yield, the phage can be 
amplified by another roimd of viral replication. The group of almost identical E. colt 
filamentous phages Ml 3, fd, and fl are most often used in phage display libraries, as either of 
the phage gin or gVm coat proteins can be used to generate fusion proteins without 
5 disrupting the ultimate packaging of the viral particle (for example, WO 90/02909; WO 
92/09690; Marks et al, J. Biol. Chem., 267:16007-16010, 1992; Griffths et al, EMBO J., 
12:725-734, 1993; Clackson e/a/.. Nature, 352:624-628, 1991; and Barbas ^^«/., PNAS 
89:4457-4461, 1992). 

In another embodiment of the present invention, the recombinant phage antibody 

10 system (e.g., RPAS, Pharmacia Catalog number 27-9400-01) is modified for use in 

expressing and screening ACH combinatorial libraries. The pCANTAB 5 phagemid of the 
RPAS kit contains the gene encoding the phage gin coat protein. In some embodiments of 
the present invention, the ACH combinatorial gene Ubrary is cloned into the phagemid 
adjacent to the gDI signal sequence such that it will be expressed as a glQ fiision protein. In 

15 other embodiments of the present invention, the phagemid is used to transform competent E. 
coll TGI cells after ligation. In still other embodiments of the present invention, transformed 
cells are subsequently infected with M13K07 helper phage to rescue the phagemid and its 
candidate ACH gene insert. The resulting recombinant phage contain phagemid DNA 
encoding a specific candidate ACH-protein and display one or more copies of the . 

20 corresponding fiision coat protein. In some embodiments of the present invention, the , 
phage-displayed candidate proteins that are capable of, for example, binding a particular 
acyl-CoA, are selected or enriched by panning. The bound phage is then isolated, and if the 
recombinant phage express at least one copy of the wild type glll coat protein, they will 
retain their ability to infect E. coli. Thus, successive rounds of reinfection of E. coli and 

25 panning greatly enriches for ACH homologs, which are then screened for fiuther biological 
activities. 

In Ught of the present disclosure, other forms of mutagenesis generally applicable 
will be apparent to those skilled in the art in addition to the aforementioned rational 
mutagenesis based on conserved versus non-conserved residues. For example, ACH 
30 homologs can be generated and screened using, for example, alanine scanning mutagenesis, 
linker scanning mutagenesis, or saturation mutagenesis. 
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?• Chemical Synthesis of ACH Polypeptides 

In an alternate embodiment of the invention, the coding sequence of an ACH is 
synthesized, whole or in part, using chemical methods well known in the art (e.g., Caruthers 
et al, Nuc. Acids Res. Symp. Ser., 7:215-233, 1980; Crea and Hom, Nuc. Acids Res., 
5 9:233 1, 1980; Matteucci and Camthers, Tetrahedron Lett., 21 :719, 1980; and Chow and 
Kempe, Nuc. Acids Res., 9:2807-2817, 1981). In other embodiments of the present 
invention, the protein itself is produced using chemical methods to synthesize either a full- 
length ACH amino acid sequence or a portion thereof For example, peptides can be 
synthesized by solid phase techniques, cleaved from the resin, and purified by preparative 

10 high performance Uquid chromatography (for example, Creighton, Proteins Structures and 
Molecular Principles, W H Freeman and Co, New York N.Y., 1983). In other embodiments 
of the present invention, the composition of the synthetic peptides is confirmed by amino 
acid analysis or sequencing (for example, Creighton, supra). 

Direct peptide synthesis can be performed using various solid-phase techniques 

1 5 (Roberge et al. Science 269:202-204, 1995) and automated synthesis may be achieved, for 
example, using ABI 43 1 A Peptide Synthesizer (Perkin Elmer) in accordance with the 
instructions provided by the manufacturer. Additionally, the amino acid sequence of ACH 
or any part thereof, may be altered during direct synthesis and/or combined using chemical 
methods with other sequences to produce a variant polypeptide. 

20 

EXPERIMENTAL 

The following examples are provided in order to demonstrate and further illustrate 
certain preferred embodiments and aspects of the present invention and are not to be 
25 construed as limiting the scope thereof 

In the experimental disclosure which follows, the following abbreviations apply: 
^C (degrees Centigrade); rpm (revolutions per minute); BSA (bovine serum albumin); H2O 
(water); HCl (hydrochloric acid); aa (amino acid); bp (base pair); kb (kilobase pair); kD 
(kilodaltons); gm (grams); \xg (micrograms); mg (milhgrams); ng (nanograms); 
30 10.1 (microliters); ml (milliliters); rrmi (millimeters); nm (nanometers); |j.m (micrometer); M 
(molar); mM (millimolar); (aM(micromolar); U (units); V (volts); MW (molecular weight); 
sec (seconds); min(s) (minute/minutes); hr(s) (hour/hours); MgCh (magnesium chloride); 
NaCl (sodium chloride); OD28O (optical density at 280 nm); ODeoo (optical density at 600 
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nm); PAGE (polyacrylamide gel electrophoresis); PBS (phosphate buffered saline [150 mM 
NaCl, 10 mM sodium phosphate buffer, pH 7.2]); BCS (biodegradable coimting scintillant); 
PGR (polymerase chain reaction); PEG (polyethylene glycol); PMSF (phenylmethylsulfonyl 
fluoride); RT-PCR (reverse transcription PGR); SDS (sodiimi dodecyl sulfate); Tris 
(tris(hydroxymethyl)aminomethane); w/v (weight to volume); v/v (voliune to volume); 
Amersham (Amersham Life Science, Inc. Arlington Heights, IL); ICN (IGN 
Pharmaceuticals, Inc., GostaMesa, CA); ATGC (American Type Gulture GoUection, 
RockviUe, MD); BioRad (BioRad, Richmond, GA); GIBGO (Life Technologies, Inc., 
Gaithersburg, MD); Invitrogen (Invitrogen Gorp., San Diego, GA); Promega (Promega 
Gorp., Madison, WI); New England Biolabs (New England Biolabs, Inc., Beverly, MA); 
Novagen (Novagen, Inc., Madison, WI); Pharmacia (Pharmacia, Inc., Piscataway, NJ); 
Sigma (Sigma Ghemical Go., St. Louis, MO); and Stratagene (Stratagene Gloning Systems, 
La JoUa, CA). 

EXAMPLE 1 
Primers'used in PCRs 

The following Table lists the primers used in the various PGRs conducted during 
the development of the present invention. 



Table 1. Primers Used in Amplification of ACH 


Primer Name 


Sequence (5* to 3*) 


SEQ ID NO: 


Use 


For ACHl 


ACHl-3 


GGAAGACATACTATATCT 
AG 


SEQIDNO:13 


Used to amplify tlie first 
partial cDNA for ACHl 


ACHl-4 


GTACCTCTCCCTTTCTGTT 
G 


SEQ ID NO: 14 




ACHl-Lncol 


CATGAACACTGAATCA 
GTTGTCG 


SEQ ID NO:15 


Used to amplify the first full- 
length cDNA for sticky-end 
insertion into pET-24d at an 
Ncol site and an Xhdl site 


ACHl-SncoI 


AACACTGAATCAGTTG 
TCGAG 


SEQ ID NO: 16 




ACHl-LxhoI 


TCGAGGGCTTCATAGC 
TTGGCC 


SEQ ID NO: 17 




ACHl-SxhoI 


GGGCTTCATAGCTTGG 
CCCC 


SEQ ID NO: 18 




ACHl-Ndel 5* 


ATCCGGTAAGTACTCATAT 
GAACACTGAATCAG 


SEQIDNO:19 


Used to characterize ACHl 
protein by amplifying 


AHClXhisextS' 


ATCCTATGGCTTCACTCGA 
GCTTGGCCCCGAAG 


SEQ ID N0:20 


cDNA by PCR from existing 
cDNA clones 


GSPl-1 


ACCCAAGCAGTACACACA 
TTCAAGTACA 


SEQ ID N0:21 


Used to screen T-DNA mutant 
Arabidopsis populations for 


GSPl-4 


CGTCGTTAACCTGTAAAAT 
GAACCACTG 


SEQIDNO:22 


ACHl kn ck out mutant 
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For ACH2 


oiiyi 


ACCCjACCO 1 C 1 AACjACjCC 




Used to amplify a portion of 
die ACH2 gene and cDNA; 
genomic PGR fragment used 
as a probe to screen a genomic 
library 


bll95 


TGATCTTTTTATTCGTCG 


SEO ID NO:24 




ACH2-5P 


CGCCATGAACACCGAGTC 
AG 


SEQID NO:25 


Used to amplify the first full- 
length cDNA 


ACH2-3P 


GAGCCATTCAGAGCTTCG 
ACG 


SEQ ID NO:26 




ACH2- 
SXbaIMBP3' 


ATCAGAGCTTCGACGTGC 
C 


SEQ ID NO:27 


Used to create sticky-end PCR 
products for putting ACH2 into 
pMAL-c2G, a maltose-biading 
protein fusion expression 
vector 


ACH2- 
LXbaIMBP3' 


CTAGATCAGAGCTTCGAC 
GTG 


SEQ ID NO:28 




ACH2- 
SEcoRIMBP5' 


CATGAACACCGAGTCAGT 
TG 


SEQ ID NO:29 




ACH2- 
LEcoRIMBPS' 


AATTCATGAACACCGAGT 
CAG 


SEQ ID NO:30 




GSP2-2 


ATTAAGTAGGAGGAAAAT 
CCATCGTGACAG 


SEQIDN0:31 


Used to screen T-DNA mutant 
Arabidopsis populations for 


GSP2-3 


CGAGAAGAAATCACAGAA 
TTGCTCAGATTAC 


SEQ ID NO:32 


A CH2 knock out mutant 


For ACH4 


ACH4-3 


CCAGGTATGTACCATTCAC 
CTG 


SEQ ID NO:33 


Used to amplify the first 
partial cDNA 


ACH4-4 


GATATGACGAGCTTCTTCC 
TCTG 


SEQ ID NO:34 




ACH4LecoRi 


AATTATGAATTCCCCAAG 
AC 


SEQ ID NO:35 


Used to amplify the sticky-end 
products for inserting ACH4 
into a GFP fusion expression 
vector 


ACH4sEcoRi 


ATGAATTCCCCAAGACCC 


SEQIDNO:36 




ACH4LbamHI 


GATCCTGAAGAATTGTGC 
C 


SEQ ID NO:37 




ACH4SBamHI 


CTGAAGAATTGTGCCTAC 


SEQ ID NO:38 






For ACH5 




ACHSLEcoRi 


AATTATGAGATCTTCAGCG 
GG 


SEQ ID NO:39 


Used to amplify the sticky-end 
products for inserting y4C/f5 
into a GFP fusion expression 
vector 


ACHSLEcoRI 


ATGAGATCTTCAGCGGGA 


SEQ ID NO:40 




ACHSLBamHI 


GATCAGGCAATGAGATGG 
G 


SEQIDNO:41 




ACHSSBamHI 


AGGCAATGAGATGGGTCT 
C 


SEQIDNO:42 




ACH5-5P 


ATGAGATCTTCAGCGGGA 
AAG 


SEQIDNO:43 


Amplified first full-length 
cDNA 


ACH5-4 


TCAAGGCAATGAGATGGG 
TC 


SEQ ID NO:44 




GSP5-5 


ACTAGGCACTACTTAGGC 
AGGAATGAAAG 


SEQ ID NO:45 


Used to screen T-DNA mutant 
Arabidopsis populations for 


GSP5-6 


TCTATCACAGAGGGAAAG 
AATGATCAAAC 


SEQ ID NO:46 


ACH5 knock out mutant 
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EXAMPLE 2 
Acyl-CoA Thioesterase (ACH) Assay 
In this Example, the methods used to deteraiine Acyl-CoA thioesterase activity m 
5 vitro are described. Two different assays have been utilized. In one, enzyme activity is 
determined from the amoxmt of labeled fatty acid released from radiolabeled acyl-CoA. In 
the other, enzyme activity is determined from the amoimt of free CoA released from acyl- 
CoA by detecting the presence of free thiols. 
Detection of labeled free fattv acids 
10 In these assays, radiolabeled acyl-CoA is used as a substrate in which the [^"^C] 

label is on the fatty acid portion of the molecule, and the enzyme activity is measured by the 
- amount of labeled free fatty acid released by the action of the enzyme. Typically, the acyl 
group is palmitoyl-CoA (16:0 Co-A) or oleoyl-CoA (18:l-CoA). Because the. reaction pH 
must be below 9, as a pH above this level causes base-catalyzed hydrolysis of the substrate, 
15 in most experiments the enzyme activity is measured between pH 7 and 9. The reactions are 
conducted at room temperature. The reaction volume is generally kept from about 32 |al to 
ICQ |iL Although the reaction times may vary, typical incubation times are 10-15 minutes, 
which is sufficient for even small amounts of protein. 

The protein solution to be assayed, which can be purified protein over-expressed 
20 in E, coll or unpurified protein in a plant extract or a plant subfraction, such as purified 
chloroplasts, is brought to the desired volume with buffer (20 mM Tris-HCl, pH 7,2, 200 
rtiM NaCl and 1 mM EDTA). The reaction is initiated by adding the substrate to the 
solution, typically to a total volume of 32 |il, and the reaction mixture is incubated at room 
temperature for the desired period of time. The reaction is terminated by adding 100 |al of 90 
25 percent isopropanol/10 percent acetic. The free fatty acids are extracted by adding 900 |j,l 
hexane to the solution, vortexing thoroughly to mix the solution, and allowing the hexane to 
separate from the aqueous phase. An 850 jxl aUquot of the hexane phase is added to 3 ml 
BCS, and the amount of radioactivity determined in a scintillation counter. 

The level of acyl-CoA thioesterase activity is determined by the amount of 
30 radioactivity that is foimd in the hexane phase. As the enzyme cleaves fatty acyl-CoAs, the 
free fatty acids that are released become insoluble in the aqueous reaction mixture and move 
into the hydrophobic hexane phase when it is added. Unhydrolyzed acyl-GoA remains in the 
aqueous reaction mixture. 
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Detection of free thiols 

In this assay, the enzyme activity is measured as the amount of free CoA released 
by the action of the enzyme; free CoA is detected by the presence of free thiols in solution. 
When the thioester bond linking the Coenzyme A and the fatty acid is cleaved, the free thiol 
5 created on the Coenzyme A can be detected in the presence of 5,5 -Dithio-bis-(2nitrobenzoic 
acid), also called DTNB or Ellman's reagent, by measuring the absorbance at 412 nm. This 
assay has been commonly used for characterization of acyl-CoA thioesterases from other 
species. 

The following conditions are typically used in this assay: 

10 



Reagent 


Amount (\x\) 


Final concentration 


The substrates used in these 
assays are fatty acyl-CoA 
molecules which vary firom 
10- to 20- carbon in length 


Acyl-CoA (O.lmM) 


180 


20 uM 


BSA (1 mg/ml) 


180 


200 M-g/ml 


DTNB (1 mM) 


180 


200 nM 


Kpi, 50 mM, pH 8.0 


357 




Enzyme (Ip-g/ml) 







The acyl-CoA thioesterase activity assays were all done in 1 .5 ml disposable 
cuvettes using an HP Diode Array Spectrophotometer and observing the change in 
absorbance of the assay mixture at 412 nm. A 'zero* cuvette was first created by adding all of 

15 the reagents together except for the enzyme. The spectrophotometer was zeroed using this 
blank cuvette. For enzymatic assay reactions, all of the reagents except for the enzyme and 
100 ul of the buffer were mixed in the cuvette; sufficient buffer and enzyme for 4 assays 
(400 ul and 12 ul, respectively) were mixed together in a separate tube. Readings were taken 
from the cuvette without the enzyme at 412 nm and at one second intervals for 

20 approximately 10 seconds before 103 ul of the buffer/enzyme mixture was added to the 

reaction cuvette. Additional readings were then taken at one second intervals for a total of 
60 seconds. The specific activity of the enzyme was determined by observing the change in 
absorbance at 412 nm and using the molar extinction coefficient of DTNB (1.36 x 10^ M" 
^ cm"^). Each assay was repeated three times. 

25 • Meas\mng the enzyme activity of acyl-CoA thioesterase is difficult because the 

acyl-CoA molecules exhibit substrate inhibition, presumably due to their detergent-like 

characteristics. The concentration at which the acyl-CoA molecules begin to inhibit the 

thioesterase activity of acyl-CoA thioesterase is different depending on the chain length and 

saturation level of the fatty acid moiety. Longer chain-lengths have a greater inhibitory 

30 capacity, which drops as double bonds are introduced in the fatty acid portion. 
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The BSA concentration also has a significant effect on the enzyme activity; these 
effects vary, depending upon the substrate fatty acid. Therefore, two different conditions 
were used during these assays. The first set of conditions maintained a constant BSA 
concentration for every substrate tested (200 :g/ml BSA and 20 :M substrate). The second 
5 set of conditions utilized the optimal BSA concentration for each substrate; the optimal BSA 
concentrations were determined in an assay as described above, but in one-third the volxmie, 
or 300 :1. The optimization assays were conducted in a 96-well plate format using a plate 
reader that measxured the change in absorbance of the assay mixture at 412 nm. In these 
assays, the substrate was kept constant at 20 :M acyl-CoA, and the BSA concentration varied 
10 to determine the concentration at which the enzyme was most active for each substrate. 
Once this was determined, the enzyme assays were conducted in the 900 :1 format under 
'optimal' BSA conditions. 



15 EXAMPLES 

Cloning and Characterization of ACHl 

In this Example, experiments to clone and characterize ACHl are described." 
In the cloning experiments, primers were developed from the genomic sequence to 
amplify a portion of the full-length cDNA from size-selected cDNA libraries of 1-2 and 2-3 
20 kb were used. These PCRs resulted in products of two different sizes from the difFerent 
libraries. This suggested that differential splicing of the mRNA for ACHl occurs. The 
partial cDNAs were subcloned into pGEM-T-Easy (Promega), a T-overhang vector, using 
the manufacturer's instructions. These partial cDNAs represented the thioesterase portion of 
the protein, but the true 5' end, which encodes the additional N-terminal region remained 
25 unidentified at this point. 

The discovery of an additional predicted open reading frame (ORF) positioned 
very closely to what had been considered the ACHl start codon resulted in the identification 
of the 5' end much further upstream than originally thought. This region had gone 
unobserved as it bore no resemblance to other known acyl-CoA thioesterases. This extra bit 
30 of sequence, which corresponded to two extra exons of coding region, allowed the 

identification of cDNA clones in the database that encoded the entire ACHl cDNA. Two 
clones were identified, one of which was shorter than the other and had an internal nonsense 
codon. These clones, obtained from the AIMS database, consisted of the ACHl cDNA 
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sequence in pZL 1, a cloning vector provided by Gibco. The designation of the sequence 
that corresponds to the full-length sequence is "GSDB:S: 174537|H3694|H36947| 15076 
Lambda-PRL2 Arabidopsis thaliana, cDNA clone 181C20T7, length = 633." The 
designation of the sequence of the shorter clone is "GSDB:S: 174253|H37158|H37158|15287 
5 Lambda-PRL2 Arabidopsis thaliana, cDNA clones 184P8T7, length = 343." The longer 
clone was determined to be the true ACHl cDNA. This cDNA is shown in Figure 1 (SEQ 
IDNO:l). 

Genefinder was also used in an effort to discover plant acyl-CoA thioesterase 
genes. However, Genefinder missed some of the splice sites in the gene, so that the true 

10 amino acid sequence was initially ambiguous. It remains impossible to determine the entire 
correct cDNA sequence of a plant acyl-CoA thioesterase gene by searching in any database 
with other known acyl-CoA thioesterase, as the inventors have discovered that plant acyl- 
CoA thioesterases possess an additional unique amino terminal sequence. However, it was 
possible to determine that the two predicted protein regions (as described above) were part of 

15 the same protein, after it was observed that they were positioned very closely together and 
because it is unlikely to have two separate genes positioned so closely together. By 
searching the Arabidopsis database with one of the regions, it was possible to identify two 
cDNA sequences that had been deposited. After obtaining these sequences aud sequencing 
them, the tme sequence of ACHl was determined. 

20 ACHl Cloning and Overexpression 

The full-length cDNA (in other words, SEQ ID NO:l) was used as a template to 
make sticky-ended PGR products, as known in the art {See for example, Zeng (1998) 
Biotechnology, 25:206-208), which allows the creation of an insert that has overhanging 
sticky-ends, without need of using any restriction enzymes. Briefly, this was accomplished 

25 by amplifying two separate PGR products that were identical except at the 5' and 3' ends. 
One PGR product was longer at the 5' end, corresponding to a sticky-end overhang that 
results when a restriction enzyme of choice cleaves a DNA strand. The other PGR product 
was similarly longer at the 3' end, corresponding to a sticky-end overhang that would be 
created by a different restriction enzyme when it cleaves a DNA strand. The two PGR 

30 products were mixed together, denatured at high temperature, and then reannealed as the 

temperature was slowly dropped. Thus, 25 percent of the reannealed product so formed have 
the correct 5' and 3' overhangs for cloning into a vector that has been treated with restriction 
enzymes corresponding to the sticky-end overhangs (See for example. Figure 1 of Zeng, 
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supra). After the product was annealed, a vector cut with the appropriate restriction enzymes 
(New England Biolabs) and treated with calf intestinal phosphatase to prevent self-ligation 
was added. The reannealed sticky-end PGR product was then ligated into the vector using 
New England Biolabs T4 DNA ligase and buffer. Subsequently, an aliquot of the mixture 
5 (usually 15 ^l of the ligation reaction) were used to transform a cloning strain of ^. coli to 
produce large amounts the DNA. 

In order to characterize ACHl further, it was desirable to over-express the protein 
in a host cell; the over-expressed protein could then be analyzed, as by purification and 
activity assays. To that end, the sticky-ended PGR products were put into pET-24d obtained 

10 from Novagen, using New England Biolabs cloning tools (for example, buffer, Kgase, etc.). 
This was determined to correspond to the native sequence without any additional amino 
acids. The expression vector (pET-24d/ACHl) was transformed into DHIOB (for cloning 
purposes) and later into BL-21 Gold (Stratagene) (for over-expression). The BL-21 host was 
used as it provides tight control over "leaky" expression. The expression vector resulted in 

15 over-expression of the inserted gene; however, initial attempts to purify the over-expressed 
protein from this vector, either by itself or with the addition of a tag of six histidines, were 
unsuccessful. It was subsequently determined that it was necessary to remove a stop codon, 
and the experiments were repeated as described below. 

ACHl was then over-expressed as a fusion protein with six histidine tags 

20 (ACHl/6His) from a second vector. In order to characterize the AGHl protein, the ACHl 
cDNA was ampUfied by PGR from existing cDNA clones using the following primers: 

AGHlNdelS': ATCGGGTAAGTACTCATATGAAGAGTGAATGAG (SEQIDNO:19) 
AGHl Xhis ext 3': ATGGTATGGCTTGACTGGAGGTTGGGGCGGAAG (SEQ ID 
25 NO:20) 

The product of this PGR reaction , which contained restriction sites Ndel at the 5' end and 
Xho I at the 3' end, was cloned into the pET-24c (Invitrogen) vector in-frame with a 6- 
histidine tag. The stop codon of the cDNA was deleted so that the protein would terminate 
30 with a series of six histidines coded by the vector. This vector, designated AGHl-6His/pET- 
24c was then mtroduced into BL-21 Gold (DE3) E, coli chemically competent cells 
(Stratagene) using methods known in the art. 
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Cells were grown in liquid culture at 37^C to an optical density at 600 nm of 0.5 - 
0.6, then cooled to room temperature. IPTG was added to a final concentration of 800 uM 
and the cells were shaken at 300 rpm overnight at room temperature. The cells were then 
collected by centrifiigation and frozen overnight at -20°C or frozen in liquid nitrogen. After 
5 thawing, the cells were resuspended in IX Binding Buffer (all buffers used in this 

pxuification process were made according to instructions in the Novagen His-Bind Kit 
manual except that the buffers were pH 6.0 and not pH 7.9). The resuspended cells were 
lysed by sonication and subsequently centrifuged at 20,000 x g for 30 min to separate the 
insoluble and soluble cell fractions. 
10 The soluble ACHl/6His was subsequently purified using Ni2+ affmity column 

chromatography according to the protocols outlined by Novagen. However, ACHl/6His was 
not purified using this technique, because the protein appeared to be present only in insoluble 
inclusion bodies. 

The entire expression and purification process was repeated, except that the 
1 5 induced cells were shaken overnight at 4°C rather than at room temperature. Growing the 
cells at the lower temperature was an attempt to solve possible folding problems that the 
over-expressed protein might be experiencing at higher temperatures. However, this attempt 
at obtaining more soluble protein was also unsuccessfiil. 
Activitv of ACHl 

20 The activity of the over-expressed ACHl protein, which was expressed as a fiision 

protein with a 6 histidine tag as described above, was assayed for acyl-CoA thioesterase by 
utilizing radiolabeled acyl-CoA substrates, as described previously. The enzyme source was 
cell lysate prepared from the transfected cells; However, the activity of the lysate from cells 
over-expressing ACHl was no greater than the level of activity in cells transfected with a 

25 control plasnndd. 

Isolation of ACHl mntmtArabidopsis 

A population of thousands of Arabidopsis lines that carry random genomic T- 
DNA insertions, housed at the University of Wisconsin, was screened for ACHl mutants. 
Using PGR, it is possible to screen through the thousands of plants in order to find a single 

30 line that carries a T-DNA insertion in the gene of interest. Primers were constructed that 
would be appropriate for screening the DNA collection for e^ACHl mutant, in accordance 
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The following primer 



GSPl-1: ACCCAAGCAGTACACACATTCAAGTACA (SEQIDNO:21) 
5 GSPl-4: CGTCGTTAACCTGTAAAATGAACCACTG (SEQIDNO:22) 

These primers were provided to the Knock Out facility at Wisconsin where they 
were used for screening the ALPHA population of mutants (which are the kanamycin- 
resistant lines). PGR reactions conducted at Wisconsin were screened by Southem blotting 

10 in order to identify pools of DNA that contained DNA from an ACHl knockout plant. The 
primary screen indicated that there existed at least two ACHl T-DNA mutants, one in pool 
#7 and one in pool #9. This screen involved Southem blotting the PGR reactions sent from 
Wisconsin, which was performed with a DIG-labeled probe (Boehringer Mannheim) 
following the manufacturers instructions for probe construction and probing techniques. The 

15 template for the probe was the genomic fragment amplified by GSPl-1 and GSPl-4. - 

The secondary screen involved another round of PGR reactions, run at Wisconsin 
with DNA subpools of the primary pools. This revealed that the mutant in pool #7 was in 
subpool #57, and the mutant in pool #9 was in subpool #8 1 . Seed pools for these two 
subpools were ordered from the Arabidopsis Biological Resource Center. Following the 

20 protocols designed by the Knockout Facility, these seed pools were screened by PGR.; A 
mutant was eventually identified in seed pool 2003 of the subpool #81. The T-DNA . 
insertion is located at bp 1746 of the ACHl gene in the 9*^ exon. The other mutation in 
subpool #57, in which the T-DNA is inserted at bp 813 in the 5^^ exon; the seed pool has not 
yet been identified. 

25 The mutant plants are then grown, and the plants evaluated for the effect of the 

mutation on the plant. The effect of the mutation is assessed in germinating seedlings, in 
growing and mature plants, and in flower and seed development. The assessment includes 
evaluation of phenotypic appearance, growth rate (as determined by parameters such as dry 
matter accumulation), time to reach different developmental stages, seed set and viability, 

30 and biochemical analyses which include lipid and fatty acid analysis' of various plant tissues. 
Since there appears to be a second AGH homolog in the putative peroxisomal class of ACHs, 
it is anticipated that ACHl mutants will not be significantly different firom wild-type plants if 
the two genes in this class serve the same fimction. However, it is anticipated that these 
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mutants can be crossed with a plant canying a T-DNA insertion in the ACH2 gene, thus 
completely removing the activity of the enaymes of this class. This will be useful for further 
evaluating the function of these genes in vivo. Any effects of this mutant may be ascribed to 
the function of the ACHl. 



EXAMPLE 4 
Cloaing and Characterization of ACH2 
In this Example, experiments to clone and characterize ACH2 are described. 

10 Cloning 

Searching the Arabidopsis database with a homologous protein (for example, E. 
coli TesB or hxmian IHV-l Nef-associated thioesterase) revealed a small portion of the 
ACH2 cDNA. Two sequences were identified in the Arabidopsis database that showed high 
homology with acyl-CoA thioesterase from other organisms. ACH2 was identified as one of 
15 these sequences. This sequence was designated as "GSDB:S:3285641T04836|T04836|884 
AT-NHC Arabidopsis thaliana cDNA clone Bl 19XP, length = 478." The aUgnment of £. 
coli TesB (228 to 283) (SEQ ID NO:9) and the putative ACH2 sequence (81 to 248) (SEQ 
ID NO: 10) is shown below. 





Alignment ofE. co// TesB and Putative ACH2 Sequence 


TesB 


TIDHSMWFHRPFNLNEWLLYSVESTSASSARGFVRGEFYTQDGVLVASTVQEGVMR (SEQ ID 
NO:9) 




++DH+MWF P +EWLLY + S+A RGFVG+ + + GLVS QE-H-R 


ACH2 


SLDHAMWFTDPLRADEWLLYVIVSPTAHETRGFVTGQMFNRKGELVVSLTQEALLR (SEQ ID 
NO: 10) 



20 

Initially, this was the only cDNA sequence available, although there was also a 
genomic clone that contained all but one end of the complete gene. However, the presence 
of this genomic clone was initially unrecognized, because of poor quality sequencing data 
that had been submitted for the clone (F21N1 1-Sp6 IGF Arabidopsis thaliana genomic clone 
25 F32N1 1, genomic survey sequence). The cDNA clone that was in the database turned out to 
be only the last 250 bp of the ACH2 cDNA. Using this sequence, primers were produced to 
amplify a 500 bp genomic fragment which was then used to screen an Arabidopsis genomic 
library for the ACH2 gene. A screen conducted using Xl-1 Blue MRA P2 cells (Stratagene), 
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according to the manufacturer's instructions, resulted in the isolation of a lambda FIX clone 
that carried the ACH2 gene. 

This gene was sequenced by starting at the known region and walking toward the 
5* end using methods known in the art. Sequencing was considered to be complete when all 
5 of the sequence had been identified which was homologous to ACHl . By comparing the 
ACHl cDNA sequence to the ACH2 genomic sequence, the exons of the ACH2 gene were 
predicted. Primers for the predicted 5' and 3' ends of the cDNA were developed in order to 
amplify a full-length cDNA firom Arabidopsis total RNA using Superscript One-Step RT- 
PCR System (Gibco). Using poly-A RNA prepared fi"om various tissues, fixll-length cDNAs 

10 were successfully amplified. Full-length cDNA ampUfied firom leaf poly-A RNA was then 
ligated into pGEM-T Easy and the vector was used to transform DHIOB E. coli cells, using 
methods known in the art. 

After sequencing the ACH2 cDNA amplified fi"om leaf poly-A RNA, it was 
determined that there was a 1 bp difference between that sequence and the genomic DNA. 

15 As the genomic sequence was viewed as being more accurate, the root cDNA was subcloned 
into pGEM-T Easy and the vector used to transform DHIOB cells, in order to determine 
whether it contained the correct sequence. Sequencing of this product indicated that it,, 
contained errors as well, although they were different firom the errors observed in the leaf 
cDNA. This indicated that the Superscript Reverse Transcriptase used to amplify the cDNA 

20 firom the poly-A RNA was possibly faulty (in other words, by putting random errors 
throughout the PGR product). 

The remaining ACH2 cDNAs from various tissues (dry seed, rosette leaves, total 
aerial, and silique tissues) were subcloned into pGEM T-Easy. Instead of transforming all of 
these pGEM-T Basy/ACH2 ligations into DHIOB cells, these cDNAs were directly 

25 sequenced. Sequencing indicated that the cDNA amplified firom the rosette leaf poly-A 
RNA had the correct sequence (SEQ ID NO:2) when compared to the ACH2 genomic 
sequence. The total aerial ACHl cDNA contained a 1 bp mistake. It is contemplated that 
these small mistakes will find use, in that they provide different sequences firom the native 
cDNA, but would result in only small changes at the protein level. Based on these 

30 sequencing results, the pGEM-TA4C//2(rosette) vector was used to transform DHl OB cells. 
After the clones were sequenced, it was determined that there was a 4 bp insertion introduced 
into the cDNA sequenced. This insertion was not present when the PGR products were 
directly sequenced, but appeared following subcloning into the vector. This puzzling 
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development indicated that the correct ACH2 cDNA sequence remained to be identified and 
cloned. 

To create sticky-end products for subcloning ACH2 into pET-24d with a 6-His tag 
in jframe, the cDNA amplified fi-om the rosette leaf poly- A RNA was used as a template. 
5 DHIOB cells were transformed with the resultant plasmid. One clone was identified with a 
single bp change that caused no change at the amino acid level. 

ACH2 was over-expressed as two fusion proteins: a fusion with a maltose- 
binding protein (MBP) in E. coli (ACH2/MBP) and as a fusion protein with six histidine 
residues (ACH2/6His), as described below. Both proteins were then assayed for acyl-CoA 

10 activity, also as described below. 

Overexpression and Purification of ACH2/MBP 

The ACH2-6HIS/pET-24d plasmid (as described below) was used as a template to 
create sticky-end products and introduce ACH2 into pMAL-c2G (New England Biolabs) (in 
other words, a maltose-binding protein expression vector). This vector was introduced into 

15 XL-1 Blue cells (Stratagene) and then into BL-21 Gold (DE3) cells. Because the host cells 
may contain endogenous acyl-CoA thioesterase activity, E. coli cells expressing only the 
MBP were used as a control for purification and characterization of the exogenous 
Arabidopsis ACH2. The cells were lysed and the soluble portion was separated fi:om the 
insoluble portion by centrifugation. The ACH2-MBP fusion protein and the MBP protein 

20 were then purified by passing the soluble extracts over amylose columns. Amylose binds 
MBP, while contaminating proteins pass through the column. Both ACH2-MBP and MBP 
were purified to nearly 90 percent purity in this way. 
Overexpression and Purification of ACH2/6His 

The stop codon of the ACH2 coding region was removed so that it could be cloned 

25 into the pET-24d plasmid in-fi"ame with a string of six histidine residues. The string of 

histidines allows for subsequent purification of ACH2 using Ni2+ affinity chromatography. 
The pET-24d plasmid (Novagen) containing the ACH2 coding region was introduced into 
chemically competent Bl-21 GOLD (DE3) E, coli cells (Stratagene) using techniques known 
in the art. 

30 The cells were grown in LB Medium until they reached an optical density between 

0.5 - 0.6. They were then cooled to room temperature, and protein over-expression initiated 
by adding IPTG to a final concentration of 800 :M. The cells were grown pvemight at room 
temperature, shaldng at 300 rpm. The cells were then collected by centrifiigied, and either 
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frozen slowly overnight at -20°C or frozen instantly with liquid nitrogen. The frozen cells 
were resuspended in IX Binding Buffer (all buffers used in this purification process were 
made according to instructions in the Novagen His-Bind Kit manual except that the buffers 
were made at pH 6.0 and not 7.9). The resuspended cells were lysed by sonication and 
5 subsequently centrifuged at 20,000 x g for 30 min to separate the insoluble and soluble cell 
fractions. 

The soluble ACH2/6His was subsequently purified using Ni2+ affinity column 
chromatography according to the protocols outlined by Novagen. This protocol allowed for 
nearly complete purification of the ACH2/6His protein from other E, coli proteins, 

10 Activitv of ACH2 

ACH2 was over-expressed as two fixsion proteins: a fiision with a maltose- 
binding protein (MBP) in E. coli (ACH2/MBP), and a fiision protein with six histidine 
residues (ACH2/6His). Both proteins were assayed for acyl-CoA activity. 

Both ACH2/MBP and MBP were over-expressed and pxirified as described above. 

15 The purified MBP control was important because although the MBP has no inherent acyl- 
CoA thioesterase activity, some E. coli thioesterases may contaminate both piirified proteins. 
Assaying the purified MBP protein for acyl-CoA thioesterase activity was conducted in order 
to determine how much of the total activity was due to contaminating E. coli proteins. This 
"background" activity was then subtracted from the total acyl-CoA thioesterase activity of 

20 the ACH2-MBP purified protein in order to determine how much activity is due solely to 
ACH2-MBP. 

The purified protein samples were assayed for acyl-CoA thioesterase, activity 
essentially as described above with radiolabeled acyl-CoA as a substrate; 0.5 jig of total 
protein (determined using the Bradford assay, as known in the art) were incubated with 

25 either radiolabeled 16:0-CoA or radiolabeled 18:0 Co-A. The reaction mixture included 
buffer containing 20 mM Tris-HCl, pH 7.2, 200 mM NaCl, and 1 mM EDTA in a total 
volume of 32 jal; there were also trace amounts of maltose, which was used to elute the 
proteins off of the amylose column. The assays were incubated at room temperature for 25 
minutes, then stopped by the addition of 100 \i\ of 90 percent isopropanol/10 percent acetic 

30 acid. 900 \x\ of hexane were added, and the mixture vortexed and then briefly centrifiiged to 
separate the hexane phase from the aqueous phase. An 800 |xl aliquot was removed from the 
hexane phase and added to 3 ml BCS, and the amount of radioactivity determined in a 
scintillation counter. 
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The resxilts, as shown in Figure 7, indicate that purified ACH2-MBP was 39 times 
more active than the purified MBP with 16:0 CoA as the substrate, and 37.2 times more 
active with 18:0 CoA as the substrate. Thus, these results demonstrate that ACH2-MBP 
functions as an acyl-CoA thioesterase. 
5 ACH2/6His was over-expressed and purified as described above, and then assayed 

for acyl-CoA activity by measuring the amount of firee CoA released, essentially as described 
previously. The enzyme activity was measured with both standard BSA concentrations, and 
with optimal BSA concentrations for different substrates. The optimal BSA concentrations 
are as follows: 

10 Substrate Optimal BSA Concentration 

(^ig/ml) 



10:0-CoA 


0 


12:0-CoA 


0 


14:0-CoA 


60 


16:0-CoA 


170 


16:l-CoA 


100 


18:0-CoA 


320 


18:l-CoA 


250 


18:2-CoA 


120 



The results of the enzyme assays are shown in Figure 8. In Figure 8 A, the enzyme 
assays were conducted with 20 |aM acyl-CoA substrate, and 200 fig/ml BSA; ia Figure SB, 
the enz3mie assays were conducted with 20 \iM acyl-CoA substrate and an optimized BSA 

25 substrate, as indicated above. The results demonstrate that ACH2/6His possesses acyl 
Coenzyme A thioesterase activity with acyl-CoA substrates with different chain-lengths. 
The observed enzyme activity is dependent upon the concentration of BSA present in the 
assay mixture. When the BSA level is held constant at 200 ug/ml, ACH2/6His exhibits a 
specificity for longer-chain acyl-CoAs, with 18:l-CoA as the best substrate (see Figure 8 A). 

30 When an optimal concentration of BSA is added for each substrate, the preference of the 

enzyme shifts toward smaller chain-lengths of acyl-CoAs, with 14:0-CoA and 16:l-CoA as 
the best substrates (see Figure 8B). The activity of ACH2/6His with the substrate is 20:0- 
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CoA could not be determined, because the optimal BSA concentration is so high that it 
hinders the assays. 

Isolation of ACH2 Arabidopsis mutant 

A population of thousands of Arabidopsis lines that carry random genomic T- 
5 DNA insertions, housed at the University of Wisconsin, was screened for ACH2 mutants. 
Using PGR, it is possible to screen through the thousands of plants in order to find a single 
line that carries a T-DNA insertion in the gene of interest. Primers were constructed that 
would be appropriate for screening the DNA collection for an ACH2 mutant, in accordance 
with the recommendations of the Arabidopsis Knockout Facility. The following primer 
10 sequences were used for the screening: 

GSP2-2: ATTAAGTAGGAGGAAAATCCATCGTGACAG (SEQIDNO:31) 
GSP2-3: CGAGAAGAAATCACAGAATTGCTCAGATTAC (SEQIDNO: 32) 

1 5 These primers were provided to the Knock Out facility at Wisconsin, where they 

were used for screening. PGR reactions that were run at Wisconsin were screened by 
Southern blotting in order to identify pools of DNA that contained DNA from an ACH2 
knockout plant. The preliminary screen showed at least two possible ACH2 T-DNA 
mutants existed in their B ASTA-resistant mutant population. These were in pool #3 and 

20 pool #1 1 . The #3 mutant was foimd using the GSP2-3 primer, while the #1 1 mutant was 
identified using the GSP2-2 primer. 

The secondary screen was less definite. This secondary screen narrows down the 
number of plants that could be the ACH2 mutant to only 10 plants, but the PGR results were 
inconclusive for both candidates. Southem blotting and PGR reamplification of PGR 

25 products finally identified the seed pool of 10 in which the mutant in pool #3 resides. This 
seed pool is designated Plate 20, Row F, Column 9 by the Arabidopsis Knock Out Facility. 
This seed was planted and the individual ACH2 mutant plant was identified. The T-DNA 
insertion is at bp 125 1 (counting firom the start codon) and is situated in the intron between 
the 8* and 9^^ exon. This should give complete inhibition of gene expression. 

30 The T-DNA mutant in pool #11 was also difficult to narrow down to the pool of 

10 plants. The Knockout Facility at Wisconsin repeated the PGR screen three times before it 
was possible to identify the seed pool of 10. The seed pool that was identified is designated 
Plate 20, Row F, Golxmm 9 by the Knockout Facility. The seeds were sown immediately 
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upon receipt. However, further screening has failed to identify an individual plant that has 
the T-DNA insertion. This mutant has the T-DNA insertion at bp 2498 (counting from the 
start codon) in the intron between the 14^^ and 15^^ exon. This should also give complete 
inhibition of gene expression. 
5 The mutant plants are then grown, and the plants evaluated for the effect of the 

mutation on the plant. The effect of the mutation is assessed in germinating seedlings, in 
growing and mature plants, and in flower and seed development. The assessment includes 
evaluation of phenotypic appearance, growth rate (as determined by parameters such as dry 
matter accumulation), time to reach different developmental stages, seed set and viability, 

10 and biochemical analyses which include lipid and fatty acid analysis of various plant tissues. 
Since there appears to be a second Acyl-CoA homolog in the putative peroxisomal class of 
ACHs, it is anticipated that the mutants will not be significantly different from wild-type 
plants if the two genes in this class serve the same function. However, it is anticipated that 
these mutants can be transfected with antisense ^CH7 and thus useful in further evaluating 

15 the function of these genes in vivo, in a manner similar to that described below in Example 
5. Any effects of the mutant may be ascribed to the function of the ACH2. 



EXAMPLE 5 

20 Cloning and Characterization of ACH4 and ACHS 

In this Example, cloning and characterization of ACH4 sndACHS are described. 

Initially, the amino acid sequence of a new mouse acyl-CoA thioesterase (Poupon et aL 

(1999) J. Biol. Chem., 274: 19188-19194) was used to BLAST the Arabidopsis database. 

This resulted in two matches, both of which were large genomic clones. The database 
25 predicted a protein for one of the genes, which was eventually demonstrated to be very close 

to the real ACHS sequence. No protein was predicted for the other gene, which was 

tentatively referred to as "ACH4" 

ACH4 
30 Cloning 

As with ACHSy ACH4 was identified by its homology to mouse thioesterase. The 
gene was in the Arabidopsis database as*a large genomic clone. Initially, the putative 
sequence of the ACHS cDNA was compared to the ACH4 gene in order to predict the 
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positions of the open reading frames. However, ACH4 and ACH5 lost their homology 
towards the end of the genes, which left the real 5* end of A.CH4 in doubt. The sequence of 
the putative ACH4 sequence was designated 

"GSDB:S:4681120|AP000372|AP000372|^ra6zrfc)p5W thaliana genomic DNA, chromosome 
5 5, TAG clone: K23F3, length 36,824." 

Subsequently, a 1.1 kb fragment of cDNA was ampUfied using total aerial poly-A 
RNA as a template in RT-PCR, using the Superscript One-Step RT-PCR System (Gibco). 
These reactions were then reamplified by traditional PGR to obtain a visible product. No 
further cDNA sequence could be determined despite various attempts at RT-PGR with 
10 primers based on genomic sequence. 

The ACH4 cDNA present in pSPORTl (Genome Systems) was sequenced. This 
clone was designated as "GSDB:S:4959535|AI999441|AI999441|701555896| Arabidopsis 
thaliana^ Golumbia Go-0, rosette-3 Arabidopsis thaliana cDNA clone 701555896, mRNA 
sequence, length = 449." The ACH4 cDNA in pSPORTl was then used as a template for 
1 5 amplifying an insert and subcloning ACH4 into pET-24d. The resultant vector was then 
introduced into DHIOB cells. It is further contemplated that binary vectors will find use in 
transforming plant tissue. 

^C/J^-Antisense Expression in ^CifJ T-DNA Mutant Background 

In order to determine the function of both ACH4 and AGH5, an antisense 

20 expression vector was created for ACH4 which could be used to transform ACH5 T-DNA 

mutants (described below). ACH4 and^C/fJ represent the two members the second class of 
acyl-GoA thioesterases of the present invention, and are putatively localized to the 
mitochondria, where they are thought to be involved in lipid synthesis. There appear to be 
no other members of this class in Arabidopsis. Therefore, expressing the ACH4 antisense 

25 construct in an ACH5 T-DNA mutant would give a complete knockout of this class of 
enzymes xmder optimal conditions. Since antisense expression often has variable 
effectiveness, any phenotype resulting from the experiment will most likely have varying 
effects, from minimal to severe, and thus shed light on the role of the enzymes in the plant. 

The ACH4 cDNA was removed from the pET-24d vector by restriction digest and 

30 introduced into a binary vector. The antisense orientation of the cDNA in relation to the 
promoter in was confirmed by PGR. The promoter, antisense cDNA, and terminator were 
then placed into a plant transformation plasmid that can be transferred from Agrobacterium 
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and stably integrated into the plant genome. The plasmid also carries a B ASTA resistance 
marker and the T-DNA flanking regions for effective T-DNA transfer and insertion. 

The ^C/T^-antisense plasmid was then introduced into an Agrobacterium strain, 
which was grown overnight, collected by centrifugation, and used to transform achS mutant 
5 Arabidopsis plants by floral dip method. The floral dip method involves inverting the plants 
to dip them into the Agrobacteriujn/dippmg media solution for about 10 seconds. The plants 
were then grown to maturity and the seed harvested. Seeds from the transformed plant were 
sown in B ASTA-contaioing soil, and seedlings that survived the BASTA treatment were 
transferred to regxilar soil after two weeks. The plants are grown to maturity, and the effect 

10 of the antisense expression is assessed in germinating seedlings, growing and mature plants, 
and in flower and seed development. The assessment includes evaluation of phenotypic 
appearance, growth rate (as determined by parameters such as dry matter accxmiulation), 
time to reach different developmental stages, seed set and viability, and biochemical analyses 
which include lipid and fatty acid analysis of various plant tissues. It is anticipated that the 

15 antisense construct will have varying and detrimental effects upon the plant. 
Isolation of ACH4 Arabidovsis mutant 

In addition to the antisense-expression studies, a T-DNA mutant from the BASTA 
pop\ilation of T-DNA mutants at the Arabidopsis Knockout Facility at the University of 
Wisconsin is identified and isolated, in a manner similar to that described for ACHl and 

20 ACH2. 



ACHS 
Cloning 

A genonnc clone was identified in the Arabidopsis database which contained the 

25 ACHS gene sequence. The database reference is 

"GSDB:S:1634607|AC002340|AC002340MraWJcp5Z5 thaliana chromosome n section 173 
of 225 of the complete sequence. Sequence from clones T6B20, Tl 1 J7, length = 79676." 
Genefinder predicted a gene and protein product in this region. Using primers based on the 
predicted cDNA (Genefinder), the Superscript One-Step RT-PCR System (Gibco) was used 

30 to ampUfy what was beUeved to be full-length cDNA for ^CH5. 

The amplified cDNA was subcloned into pPCR-Script AMP (Stratagene), a blunt- 
end cloning vector. The sequencing of full-length cDNA was then completed. The sequence 
was found to code for a protein that is nearly identical to the one predicted by Genefinder, 
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with the exception of a small region towards the N-terminus. The ACH5 cDNA was then 
subcloned into pET-24d in order to add a 6-His tag. The resultant plasmid was then 
introduced into DHIOB cells and BL-21 Gold (DE3) cells. In addition, the vector was also 
placed into C41 and C43 cells, which are E, coli strains that are proficient at expressing 
5 typically insoluble proteins in soluble form. 
Activity of ACH5 

ACH5-6His was over-expressed in E. coli using the pET24d expression vector as 
discussed above and known in the art, and the activity measured by using radiolabeled acyl- 
CoA substrates, essentially as described previously. 

10 The cells in which ACH5-6His was over-expressed were lysed and the soluble 

fraction separated from the insoluble fraction by centrifiigation. As a control, soluble extract 
from E, coli cells carrying only an empty pET24d vector was collected in the same way. The 
acyl-CoA thioesterase activity of the soluble extract of the ACH5-6His producing cells was 
then compared with that of the empty vector soluble extract, 

15 Acyl-CoA thioesterase activity was determined by incubating 0. 1 j^g of soluble 

protein (determined using the Bradford assay, as known in the art) incubated with 
radiolabeled 16:0 CoA. The reaction volume was kept at 32 |J,1 in buffer containing 20 mM 
Tris-HCl, pH 7.2, 200 mM NaCl, and 1 mM EDTA. The reaction mixture was incubated at 
room temperature for 25 minutes, then terminated with 100 ^1 of 90 percent isopropanol/lO 

20 percent acetic acid. 900 |al of hexane were added, and the mixture vortexed and briefly 

centrifuged to separate the hexane from the aqueous phase. An 800 \i\ aliquot of the hexane 
phase was removed and added to 3 ml BCS. The relative activity of the purified 
pET24d/ACH5-6His and pET24d soluble extracts was determined by coxmting the 
radioactivity in the BCS on a scintillation coimter. 

25 The results, as shown in Figure 9, indicate that the pET24d/ACH5-6His soluble 

extract was 4.7 times more active than the extract from cells containing empty pET24d 
vector. 

These results demonstrate that ACH5-6His functions as an acyl-CoA thioesterase. 
Localization of ACH5 

30 The ACH5-His/pBT-24d vector is also used as a template to create an insert for 

subcloning ACH5 into a green fluorescent protein expression. This vector is transfected into 
DHl OB cells. The fusion protein expression product is then localized ultimately in the 
mitochondria, indicating that ACH5 is a mitochondrial enzyme. 
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Isolation of ACH5 Arabidopsis mutant 

The ACH5 T-DNA mutant which was used in the ^Cii/¥-antisense expression 
experiment as described above was found in the ALPHA population of the Arabidopsis 
Knockout Facility using their protocols and methods. The primers used for the screen were 
the following: 

GSP5-5: ACTAGGCACTACTTAGGCAGGAATGAAAG (SEQIDNO:45) 
GSP5-6: TCTATCACAGAGGGAAAGAATGATCAAAC (SEQIDNO:46) 

The rounds of screening were similar to those described for ACHl and ACH2. 

The T-DNA mutant identified was identical to wild-type Arabidopsis in every way 
examined. This included both the physical characteristics of the plant as well as the lipid 
profile of the Upids stored in the seeds. These results are not surprising, since there is a close 
acyl-CoA thioesterase homologue, ACH4, which probably serves a redundant role to ACH5. 
If these enzymes play an important role in the lipid metabolism of the plant, then a double- 
knockout in which both genes are impaired, as described above, will be informative. 

Various modifications and variations of the described compositions and methods 
of the invention will be apparent to those skilled in the art without departing fi"om the scope 
and spirit of the invention. Although the invention has been described in connection with 
particular preferred embodiments, it should be understood that the inventions claimed should 
not be unduly limited to such specific embodiments. Indeed, various modifications of the 
described modes for carrying out the invention which are obvious to those skilled in the art 
and in fields related thereto are intended to be within the scope of the following claims. 
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1 . An isolated nucleic acid sequence comprising SEQ ID NO: 1 or SEQ ID 
5 NO:2 or SEQ ID NO:3 or SEQ ID NO:4 or a sequence that hybridizes to at least one of 

the foregoing sequences imder conditions of medium to high stringency. 

2. An isolated nucleic acid sequence encoding a protein comprising amino 
sequence SEQ ID NO:5 or SEQ ID NO:6 or SEQ ID NO:7 or SEQ ID NO:8. 

10 

3. The nucleic acid sequence of Claims 1-2, wherein the sequence is 
operably linked to a heterologous promoter. 

4. The nucleic acid sequence of Claims 1-3, wherein the sequence is 
15 contained within a vector. 

5. The vector of Claim 4, wherein the nucleic acid sequence is in a sense 
orientation. 

20 6. The vector of Claim 4, wherein the nucleic acid sequence is in an 

antisense orientation. 

7. A composition comprising the nucleic acid sequence of Claims 1-3. 

25 8. A host cell transfected with a nucleic acid sequence or a vector or a 

composition according to Claims 1-7 

9. A plant transfected with a nucleic acid sequence or a vector or a 
composition according to Claims 1-7. 

30 

10. A seed from the transgenic plant of Claim 9. 

1 1 . Oil from the transgenic plant of Claim 9. 
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12. A nucleic acid sequence or a composition or a vector according to Claims 
1-7 for use in altering a phenotype of a plant, 

13. A method for making a transgenic plant comprising 

a) providing a nucleic acid sequence or a vector or a composition 
according to Claims 1-7, and plant tissue, 

b) and transfecting the plant tissue with the nucleic acid sequence or 
the vector or the composition under conditions such that a transgenic plant is generated. 

14. A method for altering a phenotype of a plant comprising 

a) providing a nucleic acid sequence or a vector or a composition 
according to Claims 1-7, and plant tissue, 

b) and transfecting the plant tissue with the nucleic acid sequence or the 
vector or the composition imder conditions such that a transgenic plant is generated and 
the phenotype is altered. 

15. A method for producing variants of acyl-CoA thioesterases comprising: 

a) providing a nucleic acid sequence according to Claims 1-3, 

b) mutagenizing the nucleic acid sequence; and 

c) screening a variant encoded by the mutagenized nucleic acid 
sequence for activity. 

16. A nucleic acid sequence encoding one or more plant acyl-CoA 
thioesterases motifs, wherein the motif is a cGMP binding domain comprising SEQ ID 
NO: 11 or SEQ ID NO: 12. 

17. An isolated nucleic acid sequence encoding a plant acyl-CoA 
thioesterase, wherein the plant acyl CoA thioesterase competes for binding to a acyl- 
CoA substrate with a protein encoded by a second nucleic acid sequence and wherein 
the second nucleic acid sequence comprises a nucleic sequence according to Claims 1-2. 
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18. A composition comprising a first nucleic acid sequence, wherein the first 
nucleic acid sequence inhibits the binding of at least a portion of a second nucleic acid 
sequence to its complementary sequence and wherein the second nucleic acid sequence 
has a nucleic sequence according to Claims 1-2. 

5 

19. A purified protein comprising an amino acid sequence encoded by a 
nucleic acid sequence of Claims 1-2. 

20. A composition comprising a protein of Claim 19. 

10 

21. A compound according to Claims 1-4 substantially as described herein in 
any of the examples. 

15 
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FIGURE 1 



1 ATGAACACTG AATCAGTTGT CGAGTTCCTT GGGAATGTGA CCTTGTTGCA 
51 GCGGTTACCT AGTTCCTCTC TGAAGAGAAT CTCCGAAGTC GTTGTCTTCA 
101 AAGGTTATGA CAGAGGTGAT TATGTGGTTC GTGAAAATCA AAATGTGGAT 
151 GGAGTTTATT TTCTCTTGCA AGGACAGGCT CAGGTTCTGA GATCAGCCGG 
201 AGAGGAAAAC TATCAAGAGT TCCCTTTGAA ACGATATGAT TTCTTCGGCC 
251 ATGGTATTTT CGGGGATGTT TACTCAGCAG ATGTTGTTGC TGTGACAGAG 
3 01 CTTACCTGCT TGCTGTTGAT GTCTGATCAT CGTGCTTTAC TTGAAATAAA 
351 GTCAGTCTOT GATTCAGATA AGGAACGCTG TCTTGTGGAA GACATACTAT 
401 ATCTAGAACC ATTAGATTTG AATGTATACC GGGGGTTCAC CCCACCTAAT 
451 GCTCCAACCT ATGGAAAGGT TTATGGAGGG CAATTAGTTG GACAGGCACT 
501 TGCCGCAGCA TCAAAAACTG TTGAAACTAT GAAGATAGTC CATAATTTTC 
551 ATTGCTATfT CCTTCTTGTT GGAGATATAA ATATTCCCAT CATATATGAT 
601 GTTAACCGCT TACGTGACGG CTUlCAACTTT GCCACCAGAA GTGTAGATGC 
651 TAGACAGAAA GGAAAARCTA TATTCACCTT GTTCGCGTCA TTTCAGAAAA 
7 01 AGCAACAAGG TTTTATTCAC CAGGAGTCGA CCATGCCTCA TACACCAGCT 
751 CCTGAAACGC TTCTACCAAG GGAGGAGATG CTTGAACGGC TTGTTACTGA 
801 GCCTCTGCTA CCTAGGGATT ACCGAAACCA AGTTGCAACT GA7UVTTAGTG 
851 TTCCATTCCC TATAGATATT CGATTTTGTG AGCCAAATCG TTCCACTAAA 
901 CAGAATAAGT CTCCTCCAAG ACTAAAATAT TGGTTTAGAG CAAAGGGAAA 
951 ACTTTCTGAT GATGATCAAG CTTTGCACAG ATGTGTGGTT GCATTTGCTT 
lOOX CCGATTTGAT ATTCGCCACT ATCAGTTTAA ACCCTCACCG GAGAGAGGGC 
1051 ATGAGTGTAG CTGCTCTTAG CCTGGACCAC TCX3ATGTGGT TCCACCGACC 
llOX TGTAAGAGCA GATGATTGGT TGTTGTTTGT GATAGTGAGT CCAACTGCGA 
1151 CCGAAAGCCG CGGTTTTGCA ACTGGCAAAA TGTTCAACAG AAAGGGAGAG 
1201 CTGGTGGTAT CATTGACGCA AGAAGCTGTG TTAAGAGAAG. CTGTGACTAT 
1251 TAAGCCATCC TTCGGGGCCA AGCTATGA 
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FIGURE 2 



1 ATGAACACCG AGTCAGTTGT CGAGTTCCTT GGAAACGTCC CTTTACTCCA 
51 GAAATTACCG AGCTCTTCTC TGAAGAAAAT TGCTCAAGTT GTTGTGCCGA 
101 AACGTTACGG GAAAGGGGAT TATGTGGTTC GTGAAGATCA AACTTGGGAT 
151 GGATGTTATT TTATTTTGCA AGGAGAGGCT CAGGTTTCCG GACCAGACGA 
201 AGAAGATAAC CGTTCCGAAT TCCTCTTGAA ACAGTATGAT TACTTCGGCG 
251 TTGGTTTATC TGGGAACGTT CATTCAGCAG ACATTGTTGC TATGTCCCAG 
3 01 CTTACGTGTT TGGTGTTGCC GCGTGATCAC TGTCATTTGC TTGAGACTAA 
351 TTCCATCTGG CAATCGGATA CGTCACTTGA CAAATGCTCT TTAGTGGAAC 
401 GCATTTTGCA ACTCGACCCT TTAGAATTGA ATATCTTCAG GGGGATCACA 
451 CTACCTGATG CTCCTATATT TGGAAAGGTT TTTGGAGGAC AATTTGTCGG 
501 ACAGGCTCTT GCTGCGGCGT CAAAAACTGT TGATTTTCTC AAGGTCGTCC 
551 ACAGTTTACA CTCCTATTTT CTTCTCGTTG GAGATATTGA CATCCCCATT 
601 ATTTATCAAG TTCACCGCAT ACGTGATGGG AACAACTTTG CCACCCGAAG 
651 AGTTGATGCA GTACAGAAAG GAAATATCAT ATTCATCTTG CTGGCATCAT 
7 01 TTCAGAAAGA ACAACAAGGA TTTGAGCACC AGGAGTCAAC CATGCCCTCT 
751 GTACCAGATC CTGATACGCT TCTATCACTG GAGGAGTTGC GTGAAAGCCG 
801 TATAACTGAT CCTCATTtAC CGAGGAGTtA TCGG7UVCAAA GTTGCAACTA 
851 GAAACTTtSt TCCATGGCCT ATAGAGATtC GATTTTgtGa GCCCAGCAAT 
901 TCAACAAATC AGACAAAGTC TCCTCCGAGA TTGAACTATT GGTTTAGAGC 
951 AAAGGGAAGG CTTTCTGATG ATCAAGcTTT GCACCGAtGT GTCgTTGCAT 
1001 TTGCCTCGGA TTTAATATTT TGTGGTGTTG GTtTGAACCC tCACCGGAgA 
1051 AAAGGGGTGA AATCCGCCGC ACTTAGCCTG GAcCATGCGA TGTGGTTTCA 
XlOX CCGACCTCTA AGAGCCGACG AgTGGtTGCT CTATGTGATT GTGAGTCCAA 
1151 CTGCGCATGA GACCCGTGGT TTTGTGACTG GTCAAATGTT CAACAGGAAA 
1201 GGAGAGCTGG TTGTATCATT AACGCAAGAG GCCCTATTAA GAGAGGCTAG 
1251 ACCTCCAAAA CCCTCCGGCA CGTCGAAGCT CTGA 
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FIGURE 3 



1 ATGAATTCCC CAAGACCCAT CTCCGTCGTC TCCACCTTCG CTTCACCATC 
51 CTCCACCTCT GACCCAACCC GAAAACCCTT AAGCTTATGG CCAGGTATGT 
101 ACCATTCACC TGTCACCACC GCTCTTTGGG AAGCCAGATC CAAAATCTTC 
151 GAATCTCTTC TCGATCCTCC TAAAGACGCT CCACCACAGA GCCAGCTTCT 
201 CACCAGAACT CCTTCTCATA GCCGTACCAC CATTTTCTAC CCTTTCTCCA 
251 CTGATTTCAT TCTCAGAGAG CAATACAGAG ATCCTTGGAA CGAGGTCCGT 
301 ATCGGGATCT TGCTCGAAGA TCTCGATGCT TTAGCCGGCA CTATCTCCGT 
351 CAAGCATTGT TCTGATGATG ATAGTACAAC GAGGCCACTC TTGCTTGTTA 
4 01 CAGCTTCTGT TCATAAGATT GTCTTAAAGA AGCCGATTTG TGTCGACATT 
451 GATCTCAAGA TTGTTGCTTC TGTCATTTGG GTTGGTCGTT CATCTATCGA 
501 GATTCAACTC GAAGTTATGC AATCCGAATT GAAAGATGTC AAGGCTTCTT 
551 CAGATTCTGT.TGCTTTAACT GCAAATTTCA TTTTTGTGGC GCGTGATTCT 
601 AAGACTGGTA AAGCTGCTCC TATCAACCGT CTTTCTCCTG AAACCX3AGGT 
651 TGAGAAACTT CTCTTTGAGG AAGCTGAAGC TAGGAATAAC TTAAGGAAGA 
701 AGAAGAGAGG AGGTGACAGA AGGGAATTTG ATCACGGGGA GTGTAAGAAG 
751 CTTGAGGC-fr GGTTGGCTGA AGGAAGGATT TTCTCTGACA TGCCAGCTCT 
801 TGCTGACAGA AACAGCATTC TTCTTAAAGA CACTCGTCTT GAGAATTCGC 
851 TGATATGCCA ACCGCAGCAG AGGAACATCC ATGGTAGGAT CTTTGGAGGG 
901 TTTTTAATGC ATAGAGCATT TGAGTTGGCT TTCTCTACTG CTTATACGTT 
951 TGCGGGTCTA GTGCCTTACT TCTTAGAAGT TGATCACGTC GATTTCCTAA 
1001 GACCGGTGGA CGTCGGGGAT TTCTTACGTT TCAAATCCTG TGTTCTTmC 
1051 ACTCAACTGG ATAAACAGGA TTGTCCGCTC ATCAATATCG AAGTTGTTGC 
1101 TCATGTTACA AGTCCAGAGA TTCGCTCCAG TGAGGTTTCA AATACATTCT 
1151 ACTTCAAGTT CACTGTAAGG CCAGAGGCAA AGGCCAGAAA CAATGGGTTT 
1201 AAACTTCGAA ATGTAGTTCC TGCCACAGAG GAAGAAGCTC GTCATATCCT 
1251 CGAGCX3CATG GATGCAGAAG CTTTGAAGTC AAGCAAACAA CAATGTGTAG 
1301 GCACAATTCT TCAGTAA 
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FIGURE 4 



1 ATGAGATCTT CAGCGGG7VAA GCTTCTCTCC CATTCTCTAC GGAGAAACCG 
51 CGGGAAACTT GGTGATGGAT TTTGGATTCC GGCTACGAGG TCTCAGAAGA 
101 TGTGGAATTC GACGGTACCG TCAGATGAAA ATCCTGATCA AAATTCGATC 
151 GATGCAGGAT CTTCAATGCG GAAACCAATT AGCTTATGGC CTGGAATGTA 
201 TCATTCTCCG GTGACTAATG CGCTTTGGGA AGCTCGTCGT AATATGTTCG 
251 AAATTCCTAC CGGAGACGAC GCTTCTCAGT CAAAATTGAC GGCTAAATCT 
301 CCGTCTCGAA GCCGAACGTC AATTCTCTAT AAGTTTTCTT CTGATTTTGT 
3 SI TCTTCGAGAA CAATATAGGA ATCCTTGGAA TGAGATTCGT ACTGGTAAAT 
4 01 TGGTTGAAGA TCTTGACGCT CTTGCTGGAA CCATCTCCTT CAAGCATTGT 
4 51 GGTGGTGACA GCAGTGCGAG ATCGATGATT TTGGTGACTG CTTCAGTGGA 
501 TAGGATTATC ATGAAAAGAC CTATTCGTGT AGATACTGAC CTTAGTATAG 
551 TTGGTGCTGT TACATGGGTT GGTAGATCAT CCATGGAGAT GCAATTACAA 
601 GTACTCCAAA TTCAAGATAC CAACAACTCC TCTGAGTCGG TTGCCCTTGA 
651 AGCAAACTTT ACATTCGTGG CTCGGGATGC TCAGACAGGC AAGTCAGCTC 
701 CAATTAACCA AGTCGTGCCA GAAACTGAGC ATGAAAAATT TCTGTGGAAA 
7 51 GAGGCAGAAG AGAGGAACAA ACTACGAAAA CAAAAGAGAG CACAGGGTAA 
801 AGAAGAGCAT GAGAAGTTGA AGGACTTAGA GAGGCTAAAT GAGTTATTAG 
851 CAGAAGGAClG AGTATTCCTG GACATGCCAG CTCTTGCAGA TCGGAACAGC 
901 ATCCTAATTA AGGATACTTC CCATGAAAAC TCATTGATCT GCCAGCCACA 
951 GCAACGAAAT ATTCATGGCA GAATTTTTGG AGGCTTCTTG ATGCGCAAAG 
1001 CTTTTGAGTT GGCTTTCTCC AATGCTTACA CCTTCGCTGG GGTTTCACCA 
1051 CGTTTTCTCG AAGTTGATCG CGTCGATTTT ATCAAACCAG TGGATGTTGG 
1101 AAATTTCCTT CGTTTCAAGT CACGCGTATT GTACACAGAA GCAACCAGTT 
1151 CAGCTGAACC GCTGATAAAC ATCGAGGTTG TAGCTCATGT TACAAGTCCA 
1201 GAGCTCAGAT CCAGCGAAGT CTCCAACAGG TTTTACTTCA CCTTCAGTGT 
1251 GAGACCTGAG GCCATGAAAG ATGGATTGAA AATAAGAAAT GTGGTTCCAG 
1301 CAACAGAAGA AGAAGCTAGA AGAGTGATCG AGCGCATGGA CGCAGAGAGA 
1351 CCCATCTCAT TGCCTTGA 
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FIGURE 5 



(MNTESWEFL GNVTLLQRLP SSSLKRISEV WFKGYDRGD YWKENQN^ 
GVYFLLQGQA OVLRSAGEEN YQEFPLKRYD FFGHGIFGDV YSADWA pfE 



1 

51 V X r 1.* i.'Vcf^'^s^-"- V ■»-«*-'*» --^c*^— ~ — 

101 LTCLLLi4SDH RALLEIKSVS DSDKERCLVE DILYLEPLDL NVYKGFTPPN 

151 APTYGKVYGG QLVGQALAAA SKTVETMKIV HNFHCYFLLV GDINIPIIYD 

201 VNRLRDGNNF ATRSVDARQK GKTIFTLFAS FQKKQQGFIH QESTMPHTPA 

251 PETLLPREEM LERLVTEPLL PRDYRNQVAT EISVPFPIDI RFCEPNRSTK 

301 QNKSPPRLKY WFRAKGKLSD DDQALHRCW AFASDLIFAT ISLNPHRREG 

351 MSVAALSLDH SMWFHRPVRA DDWLLFVIVS PTATESRGFA TGKMFNRKGE 

401 IjWSIiTQEAV lreavtikps fgakl 



1 fiiTisV\^FL'^PIiLQKLP SSSLKKIAQV WPKRYGKGD YWKEDQTWD^ 

51 bcYFILCySEA QVSGPDEEDN ^ gBT^x.T.KOyn YFGVGLSGNV HSADIVAfiSQ 

101 LTCLVLPRDH CHLLETNSIW QSDTSIiDKCS LVERILQLDP LELNIFRGIT 

151 LPDAPIFGKV FGGQFVGQAL AAASKTVDFL KWHSLHSYF LLVGDIDIPI 

201 lYQVHRIRDG NNFATRRVDA VQKGNIIFIL LASFQKEQQG FEHQESTMPS 

251 VPDPDTLLSL EELRESRITD PHLPRSYRNK VATRNFVPWP lEIRFCEPSN 

301 STNQTKSPPR LNYWFRAKGR LSDDQALHRC WAFASDLIF CX3VGLNPHRR 

351 KGVKSAALSL DHAMWFHRPL RADEWLLYVI VSPTAHETRG FVTGQMFNRK 

401 GEL.WSLTQE ALLREARPPK PSGTSKL 
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FIGURE 6 



1 MNSPRPrSW STFASPSSTS DPTRKPLSLW PGMYHSPVTT ALWEARSKIF 

51 ESLLDPPKDA PPQSQLL.TRT PSHSRTTIFY PFSTDFILRE QYRDPWNEVR 

101 IGILLEDLDA LAGTISVKHC SDDDSTTRPL LLVTASVHKI VI^KKPICVDI 

151 DLKIVASVIW VGRSSIEIQL EVMQSELKDV KASSDSVALT ANFIFVARDS 

201 KTGKAAPINR LSPETEVEKL LFEEAEARNN LRKKKRGGDR REFDHGECKK 

251 LEAWLAEGRI FSDMPALADR NSILLKDTRL. ENSLICQPQQ RNIHGRIFGG 

3 01 FLMHRAFELA FSTAYTFAGL VPYFLEVDHV DFLRPVDVGD FLRFKSCVLY 

3 51 TQLDKQDCPL INIEWAHVT SPEIRSSEVS NTFYFKFTVR PEAKARNNGF 

4 01 KLRNWPATE EEARHILERM DAEALKSSKQ QCVGTILQ 



1 MRSSAGKLLS HSLRRNRGKL GDGFWIPATR SQKMWNSTVP SDENPDQNSI 

51 DAGSSMRKPI SLWPGMYHSP VTNALiWEARR NMFEIPTGDD ASQSKLTAKS 

101 PSRSRTSILY KFSSDFVLRE QYRNPWNEIR TGKIiVEDLDA LAGTISFKHC 

151 GGDSSARSMI LVTASVDRII MKRPIRVDTD LSIVGAVTWV GRSSMEMQLQ 

201 VLQIQDTNNS SESVTMjEANF TFVARDAQTG KSAPINQWP ETEHEKFI.WK 

251 EAEERNKLRK QKRAQGKEEH EKIiKDLERIiN ELLAEGRVFL DMPALADRNS 

301 IlalKDTSHEN SLICQPQQRN IHGRIFGGFL MRKAFELZ^S NAYTFAGVSP 

3 51 RFIiEVDRVDF IKPVDVGNFL RFKSRVLYTE ATSSAEPLIN lEWAHVTSP 

401 EliRSSEVSlTR FYFTFSVRPE AMKDGLKIRN WPATEEEAR RVIERMDAER 

451 PlSIiP 



THIS PAGE BLAm (m- 



wo 02/08433 



PCT/USOl/22907 



7/9 



FIGURE 7 




16:0-CoA 18:0-CoA 



OMBP !aACH2-MBP 
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Figure 8 
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ACH2-6HS Specific/WMty 
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Panel A- ACH2/6His specific activity 
Reaction Conditions: 

■ 20 uM acyl-CoA 

■ 200 ug/mlBSA 

■ 200 uM DTNB 

All reactions run at room temperature in 
900 ul format. Error bars reflect the 
standard error of three assays. 
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Panel B. ACH2/6His specific activity. 
Reaction Conditions: 

■ 20 uM acyl-CoA 

■ Optimized BSA (0 - 320 ug/ml) 

■ 200 uM DTNB 

All reactions run at room temperature in 
900 ul format. Error bars reflect the 
standard error of three assays. 
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FIGURE <^ 




□ pT24d SpET24d/ACH5-6His 
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